pith. machine review for the scientific record. sign in

arxiv: 2605.08084 · v1 · submitted 2026-05-08 · 💻 cs.RO · cs.CV

Recognition: 1 theorem link

· Lean Theorem

123D: Unifying Multi-Modal Autonomous Driving Data at Scale

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:51 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords autonomous drivingmulti-modal datadataset unificationevent streams3D object detectionreinforcement learningsensor synchronizationdata consolidation
0
0 comments X

The pith

Treating each sensor reading as an independent timestamped event stream lets one API handle eight incompatible autonomous driving datasets at once.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents 123D as a framework that stores every modality from driving sensors as its own independent stream of timestamped events rather than enforcing fixed rates or synchronization schemes. This design removes format barriers that have kept datasets separate, allowing eight real-world collections spanning 3,300 hours and 90,000 kilometers plus one synthetic set to be loaded and queried through identical calls. The unified access supports direct statistical comparisons of annotations, pose accuracy, and calibration across sources. It also enables new experiments such as training 3D object detectors on data from multiple collections and running reinforcement learning for planning on the combined corpus.

Core claim

By representing all modalities as independent timestamped event streams, 123D unifies multi-modal data from fragmented datasets into a single API that supports both synchronous and asynchronous access, enabling the consolidation of over 3,300 hours of real-world driving data and demonstrating applications in cross-dataset detection and reinforcement learning for planning.

What carries the argument

The independent timestamped event stream representation for each modality, which decouples timing from any prescribed rate and permits flexible synchronous or asynchronous querying across arbitrary datasets without custom loaders.

If this is right

  • Detectors trained on the combined data can be evaluated for generalization across different collection conditions and annotation conventions.
  • Reinforcement learning agents for driving policies gain access to a much larger and more diverse set of experiences drawn from the full 3,300-hour corpus.
  • Researchers can perform systematic audits of pose and calibration accuracy that were previously difficult to compare across sources.
  • Analysis and visualization tools become available for the entire collection without writing custom code for each original format.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same event-stream abstraction could extend to other robotics domains such as robotic manipulation or aerial navigation where sensor rates also vary widely.
  • If adopted as a release format, future datasets could avoid the fragmentation problem from the start by providing data directly in this structure.
  • The unified real and synthetic data opens concrete paths for controlled sim-to-real transfer experiments in both perception and planning.
  • Large-scale pretraining of driving policies on the full 90,000 km collection becomes feasible in the same way language models pretrain on text corpora.

Load-bearing premise

That storing each modality as an independent timestamped event stream preserves all necessary information and allows accurate synchronization or asynchronous access without introducing errors or losing fidelity from the original datasets' different rates and annotation conventions.

What would settle it

A side-by-side comparison showing that 123D-loaded synchronized frames from two datasets produce object labels or timing offsets that differ from the native loaders of those datasets would falsify the claim of lossless unification.

Figures

Figures reproduced from arXiv: 2605.08084 by Andreas Geiger, Bastian Berle, Boris Ivanovic, Changhui Jing, Daniel Dauner, Holger Caesar, Jiabao Wang, Kashyap Chitta, Long Nguyen, Maximilian Igl, Tianyu Li, Valentin Charraut, Yiyi Liao.

Figure 1
Figure 1. Figure 1: 123D. An open-source toolkit to consolidate fragmented driving data through a unified format for modalities such as annotations, sensors, and HD maps. By overcoming this fragmentation, 123D enables a wide range of cross-dataset applications and research directions, including scene reconstruction, cross-vehicle learning, and reinforcement-learning-based planning. eral robotics has done the same with LeRobot… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture. We parse existing datasets from cloud/local storage, or collect data in simulation that we write to our unified Apache Arrow [24] log format (Sec. 3.1). The scene and map API enable access to logs, and can be passed to a dataloader, viewer, or other application (Sec. 3.2). observations, but with fixed per-dataset sample rates at pre-processing time and provides no map or traffic-light abstrac… view at source ↗
Figure 3
Figure 3. Figure 3: 3D Viewer. Analyzing driving recordings requires frequent visual inspections. We show visualizations of supported datasets in 3a-3i from our interactive 3D viewer based on Viser [71]. lidar point clouds), to remain configurable along trade-offs between storage size and access latency. Importantly, sensor data access returns unified representations agnostic to the storage choice. Structural variance in HD m… view at source ↗
Figure 4
Figure 4. Figure 4: Annotation of bounding boxes. We compare ego distance, speed, and acceleration (rows) over different semantic categories, grouped into vehicle, person, two-wheeler, obstacles, and other miscellaneous classes (columns). The histograms show frequencies in the range of 0-1 on a log scale [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Multi-view 3D Object Detection. Per-dataset nuScenes detection score (NDS) for PETR [49] and BEVFormer-S [44] for vehicle detection. We evaluate on held-out validation splits of each dataset and train on nuScenes, WOD-Perc., Av2-Sens., nuPlan, CARLA, or a uniform mixture of these five (Mixed-5, dashed). PandaSet, KITTI-360, and PAI-AV are never seen during training. for novel-view reconstruction; nuPlan is… view at source ↗
Figure 6
Figure 6. Figure 6: PufferDrive Planning [16]. Results. We summarize the results on held-out test scenes in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets. Using 123D, we consolidate eight real-world driving datasets spanning 3,300 hours and 90,000 kilometers, together with a synthetic dataset with configurable collection scripts, and provide tools for data analysis and visualization. We conduct a systematic study comparing annotation statistics and assessing each dataset's pose and calibration accuracy. Further, we showcase two applications 123D enables: cross-dataset 3D object detection transfer and reinforcement learning for planning, and offer recommendations for future directions. Code and documentation are available at https://github.com/kesai-labs/py123d.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents 123D, an open-source framework that unifies multi-modal autonomous driving data from eight real-world datasets (3,300 hours, 90,000 km) and one synthetic dataset via a single API. Each modality is stored as an independent timestamped event stream with no prescribed rate to support synchronous or asynchronous access despite differing original rates, formats, and annotation conventions. The work includes tools for analysis and visualization, a systematic comparison of annotation statistics and pose/calibration accuracy, and two applications: cross-dataset 3D object detection transfer and reinforcement learning for planning.

Significance. If the unification preserves fidelity, 123D would be a substantial contribution by lowering barriers to large-scale cross-dataset training and evaluation in autonomous driving. The open-source release, scale of consolidation, and demonstrated applications add practical value; the systematic accuracy study is a positive step toward reproducibility.

major comments (2)
  1. [§3] §3 (Data Unification and Event Streams): The central claim that representing every modality as an independent timestamped event stream preserves all necessary information and permits exact reconstruction of original synchronous tuples without interpolation artifacts or loss of fidelity is load-bearing for the cross-dataset applications, yet the manuscript provides no quantitative validation (e.g., timestamp round-trip error, rate-mismatch reconstruction error, or annotation IoU before/after schema mapping) across the eight heterogeneous datasets.
  2. [§4] §4 (Annotation Statistics and Accuracy Study): The systematic comparison of annotation conventions and pose/calibration accuracy is useful, but the paper does not report how inconsistent 3D box conventions or traffic-light taxonomies were mapped into the common schema, nor any drift metrics; this directly affects the reliability of the cross-dataset 3D detection transfer results shown later.
minor comments (2)
  1. [§5] The abstract and §5 mention 'configurable collection scripts' for the synthetic dataset, but the manuscript does not specify the exact parameters or randomization ranges used, which would aid reproducibility.
  2. Figure captions for the visualization tools could more explicitly state which modalities are overlaid in each panel to improve clarity for readers unfamiliar with the original dataset formats.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the practical value of 123D. We address the major comments point by point below, indicating where revisions will be made to improve rigor and transparency.

read point-by-point responses
  1. Referee: [§3] §3 (Data Unification and Event Streams): The central claim that representing every modality as an independent timestamped event stream preserves all necessary information and permits exact reconstruction of original synchronous tuples without interpolation artifacts or loss of fidelity is load-bearing for the cross-dataset applications, yet the manuscript provides no quantitative validation (e.g., timestamp round-trip error, rate-mismatch reconstruction error, or annotation IoU before/after schema mapping) across the eight heterogeneous datasets.

    Authors: We agree that explicit quantitative validation would strengthen the central claim. The event-stream design stores each modality with its native timestamps and raw content, performing no interpolation, resampling, or data alteration; reconstruction of original tuples is achieved by time-window queries on the independent streams. However, the submitted manuscript indeed lacks the requested metrics. In revision we will add a dedicated validation subsection to §3 that reports timestamp round-trip errors and synchronous reconstruction fidelity on representative subsets of the eight datasets, plus before/after annotation IoU statistics for the schema mappings. revision: partial

  2. Referee: [§4] §4 (Annotation Statistics and Accuracy Study): The systematic comparison of annotation conventions and pose/calibration accuracy is useful, but the paper does not report how inconsistent 3D box conventions or traffic-light taxonomies were mapped into the common schema, nor any drift metrics; this directly affects the reliability of the cross-dataset 3D detection transfer results shown later.

    Authors: We concur that the mapping procedures and any drift metrics must be documented for reproducibility. The current text describes the target schema but omits the concrete rules used for 3D box coordinate-frame unification, orientation conventions, and traffic-light category harmonization. In the revised §4 we will insert a table and textual description of these mappings together with any available quantitative drift or accuracy metrics drawn from the original dataset releases and our own analysis. This addition will directly support interpretation of the cross-dataset transfer experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: software framework with no derivations or self-referential reductions

full rationale

The paper describes a data unification framework that stores modalities as independent timestamped event streams to enable cross-dataset access. No mathematical derivations, fitted parameters, predictions, uniqueness theorems, or ansatzes are present. Claims concern the existence of the released tool, its coverage of eight datasets, and two downstream applications; these are externally verifiable via the open-source code and data rather than reducing to self-citations or inputs by construction. The central design choice is presented as an engineering decision, not derived from prior results in a load-bearing way.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a data unification framework the central claim rests on the domain assumption that multi-modal sensor streams can be losslessly represented as independent timestamped events and that the provided tools faithfully expose original dataset properties.

axioms (1)
  • domain assumption Multi-modal driving data from heterogeneous datasets can be represented as independent timestamped event streams without loss of synchronization or annotation fidelity.
    Invoked to justify the single-API design and cross-dataset access.

pith-pipeline@v0.9.0 · 5587 in / 1424 out tokens · 47991 ms · 2026-05-11T01:51:58.627724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

  1. [1]

    High performance i/o for large scale deep learning

    Alex Aizman, Gavin Maltby, and Thomas Breuel. High performance i/o for large scale deep learning. In2019 IEEE International Conference on Big Data (Big Data), pages 5965–5967. IEEE, 2019

  2. [2]

    π0: A vision-language-action flow model for general robot control.arXiv.org, 2024

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language-action flow model for general robot control.arXiv.org, 2024

  3. [3]

    G. Bradski. The OpenCV Library.Dr . Dobb’s Journal of Software Tools, 2000

  4. [4]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

  5. [5]

    Lerobot: State-of-the-art machine learning for real-world robotics in pytorch

    Remi Cadene, Simon Alibert, Alexander Soare, Quentin Gallouedec, Adil Zouitine, Steven Palma, Pepijn Kooijmans, Michel Aractingi, Mustafa Shukor, Dana Aubakirova, Martino Russi, Francesco Capuano, Caroline Pascal, Jade Choghari, Jess Moss, and Thomas Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch. https://github. com/h...

  6. [6]

    nuscenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020

  7. [7]

    Pseudo-simulation for autonomous driving

    Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Pseudo-simulation for autonomous driving. InProc. Conf. on Robot Learning (CoRL), 2025

  8. [8]

    Sam 3: Segment anything with concepts.Proc

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.Proc. of the International Conf. on Learning Representations (ICLR), 2026

  9. [9]

    Unified domain generalization and adaptation for multi-view 3d object detection.Advances in Neural Information Processing Systems (NeurIPS), 2024

    Gyusam Chang, Jiwon Lee, Donghyun Kim, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sujin Jang, and Sangpil Kim. Unified domain generalization and adaptation for multi-view 3d object detection.Advances in Neural Information Processing Systems (NeurIPS), 2024

  10. [10]

    Argoverse: 3d tracking and forecasting with rich maps

    Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

  11. [11]

    Olmix: A framework for data mixing throughout lm development.arXiv.org, 2026

    Mayee F Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi, Christo- pher Ré, Luca Soldaini, and Kyle Lo. Olmix: A framework for data mixing throughout lm development.arXiv.org, 2026. 10

  12. [12]

    Omnire: Omni urban scene reconstruction.Proc

    Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, et al. Omnire: Omni urban scene reconstruction.Proc. of the International Conf. on Learning Representations (ICLR), 2025

  13. [13]

    Masked-attention mask transformer for universal image segmentation

    Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022

  14. [14]

    Open x-embodiment: Robotic learning datasets and rt-x models

    OX-Embodiment Collaboration, Abby O’Neill, Abdul Rehman, Abhinav Gupta, Abhiram Mad- dukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, et al. Open x-embodiment: Robotic learning datasets and rt-x models. InProc. IEEE International Conf. on Robotics and Automation (ICRA), 2023

  15. [15]

    MMDetection3D: OpenMMLab next-generation platform for general 3D object detection

    MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020

  16. [16]

    PufferDrive: A fast and friendly driving simulator for training and evaluating RL agents, 2026

    Daphne Cornelisse*, Spencer Cheng*, Pragnay Mandavilli, Julian Hunt, Kevin Joseph, Waël Doulazmi, Valentin Charraut, Aditya Gupta, Joseph Suarez, and Eugene Vinitsky. PufferDrive: A fast and friendly driving simulator for training and evaluating RL agents, 2026. URL https://github.com/Emerge-Lab/PufferDrive

  17. [17]

    Robust autonomy emerges from self-play.Proc

    Marco Cusumano-Towner, David Hafner, Alex Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, et al. Robust autonomy emerges from self-play.Proc. of the International Conf. on Machine learning (ICML), 2025

  18. [18]

    Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems (NeurIPS), 2024

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems (NeurIPS), 2024

  19. [19]

    Refav: Towards planning-centric scenario mining

    Cainan Davidson, Deva Ramanan, and Neehar Peri. Refav: Towards planning-centric scenario mining. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026

  20. [20]

    CARLA: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InProc. Conf. on Robot Learning (CoRL), 2017

  21. [21]

    Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. InProc. of the IEEE International Conf. on Computer Vision (ICCV), 2021

  22. [22]

    Unitraj: A unified framework for scalable vehicle trajectory prediction

    Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, and Alexandre Alahi. Unitraj: A unified framework for scalable vehicle trajectory prediction. InProc. of the European Conf. on Computer Vision (ECCV), 2024

  23. [23]

    Common crawl

    Common Crawl Foundation. Common crawl. https://commoncrawl.org, 2026

  24. [24]

    Apache arrow

    The Apache Software Foundation. Apache arrow. https://github.com/apache/arrow, 2026

  25. [25]

    Are we ready for autonomous driving? The KITTI vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2012

  26. [26]

    The llama 3 herd of models.arXiv.org, 2024

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv.org, 2024

  27. [27]

    Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research.Advances in Neural Information Processing Systems (NeurIPS), 2023

    Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, et al. Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research.Advances in Neural Information Processing Systems (NeurIPS), 2023. 11

  28. [28]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016

  29. [29]

    One thousand and one hours: Self-driving motion prediction dataset

    John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, and Peter Ondruska. One thousand and one hours: Self-driving motion prediction dataset. InProc. Conf. on Robot Learning (CoRL), 2021

  30. [30]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023

  31. [31]

    Opengis® implementation standard for geographic informa- tion - simple feature access - part 1: Common architecture

    Open Geospatial Consortium Inc. Opengis® implementation standard for geographic informa- tion - simple feature access - part 1: Common architecture. https://www.ogc.org/standards/sfa, 2011

  32. [32]

    ISO 8855:2011(en) Road vehicles — Vehicle dynamics and road-holding ability — V ocabulary

    International Organization for Standardization. ISO 8855:2011(en) Road vehicles — Vehicle dynamics and road-holding ability — V ocabulary. https://www.iso.org/obp/ui/en/#iso:std:iso: 8855:, 2011

  33. [33]

    trajdata: A unified interface to multiple human trajectory datasets

    Boris Ivanovic, Guanyu Song, Igor Gilitschenski, and Marco Pavone. trajdata: A unified interface to multiple human trajectory datasets. InAdvances in Neural Information Processing Systems (NIPS), 2023

  34. [34]

    Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems (NeurIPS), 2024

    Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems (NeurIPS), 2024

  35. [35]

    Towards learning-based planning: The nuplan benchmark for real-world autonomous driving

    Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, et al. Towards learning-based planning: The nuplan benchmark for real-world autonomous driving. InProc. IEEE International Conf. on Robotics and Automation (ICRA), 2024

  36. [36]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  37. [37]

    Coin3d: Revisiting configuration-invariant multi-camera 3d object detection.Proc

    Zhaonian Kuang, Rui Ding, Haotian Wang, Xinhu Zheng, Meng Yang, and Gang Hua. Coin3d: Revisiting configuration-invariant multi-camera 3d object detection.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026

  38. [38]

    Terraseg: Self- supervised ground segmentation for any lidar.Proc

    Ted Lentsch, Santiago Montiel-Marín, Holger Caesar, and Dariu M Gavrila. Terraseg: Self- supervised ground segmentation for any lidar.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026

  39. [39]

    Str: A simple and efficient algorithm for r-tree packing

    Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. Str: A simple and efficient algorithm for r-tree packing. InProceedings 13th international conference on data engineering, pages 497–506. IEEE, 1997

  40. [40]

    Datasets: A community library for natural language processing

    Quentin Lhoest, Albert Villanova Del Moral, Yacine Jernite, Abhishek Thakur, Patrick V on Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al. Datasets: A community library for natural language processing. InConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

  41. [41]

    Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems (NeurIPS), 2023

    Quanyi Li, Zhenghao Mark Peng, Lan Feng, Zhizheng Liu, Chenda Duan, Wenjie Mo, and Bolei Zhou. Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems (NeurIPS), 2023

  42. [42]

    Mtgs: Multi-traversal gaussian splatting.arXiv.org, 2025

    Tianyu Li, Yihang Qiu, Zhenhua Wu, Carl Lindström, Peng Su, Matthias Nießner, and Hongyang Li. Mtgs: Multi-traversal gaussian splatting.arXiv.org, 2025

  43. [43]

    Tactics2d: A highly modular and extensible simulator for driving decision-making.IEEE Transactions on Intelligent V ehicles (T-IV), 2024

    Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Jing Yang, Yeqiang Qian, Chunxiang Wang, and Ming Yang. Tactics2d: A highly modular and extensible simulator for driving decision-making.IEEE Transactions on Intelligent V ehicles (T-IV), 2024. 12

  44. [44]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.Proc

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.Proc. of the European Conf. on Computer Vision (ECCV), 2022

  45. [45]

    Is ego status all you need for open-loop end-to-end autonomous driving? InProc

    Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, and Jose M Alvarez. Is ego status all you need for open-loop end-to-end autonomous driving? InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2024

  46. [46]

    Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Trans

    Yiyi Liao, Jun Xie, and Andreas Geiger. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2022

  47. [47]

    Depth anything 3: Recovering the visual space from any views.arXiv.org, 2025

    Haotong Lin, Sili Chen, Junhao Liew, Donny Y Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv.org, 2025

  48. [48]

    A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent V ehicles (T-IV), 2024

    Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, and Alois C Knoll. A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent V ehicles (T-IV), 2024

  49. [49]

    Petr: Position embedding transfor- mation for multi-view 3d object detection

    Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InProc. of the European Conf. on Computer Vision (ECCV), 2022

  50. [50]

    Lead: Minimizing learner-expert asymmetry in end-to-end driving.Proc

    Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, and Kashyap Chitta. Lead: Minimizing learner-expert asymmetry in end-to-end driving.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026

  51. [51]

    PhysicalAI-Autonomous-Vehicles

    NVIDIA. PhysicalAI-Autonomous-Vehicles. https://huggingface.co/datasets/nvidia/ PhysicalAI-Autonomous-Vehicles, 2025. Hugging Face dataset

  52. [52]

    NVIDIA DRIVE Hyperion: L4-Ready autonomous vehicle platform

    NVIDIA. NVIDIA DRIVE Hyperion: L4-Ready autonomous vehicle platform. https://www. nvidia.com/en-us/solutions/autonomous-vehicles/drive-hyperion/, 2026. NVIDIA product page

  53. [53]

    PhysicalAI-Autonomous-Vehicles-NCore

    NVIDIA. PhysicalAI-Autonomous-Vehicles-NCore. https://huggingface.co/datasets/nvidia/ PhysicalAI-Autonomous-Vehicles-NCore, 2026. Hugging Face dataset

  54. [54]

    Fastgs: Training 3d gaussian splatting in 100 seconds.Proc

    Shiwei Ren, Tianci Wen, Yongchun Fang, and Biao Lu. Fastgs: Training 3d gaussian splatting in 100 seconds.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026

  55. [55]

    Drivelm: Driving with graph visual question answering

    Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InProc. of the European Conf. on Computer Vision (ECCV), 2024

  56. [56]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020

  57. [57]

    Openpcdet: An open-source toolbox for 3d object detection from point clouds

    OpenPCDet Development Team. Openpcdet: An open-source toolbox for 3d object detection from point clouds. https://github.com/open-mmlab/OpenPCDet, 2020

  58. [58]

    Neurad: Neural rendering for autonomous driving

    Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. Neurad: Neural rendering for autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2024

  59. [59]

    Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way.IEEE Robotics and Automation Letters (RA-L), 8(2):1029–1036, 2023

    Ignacio Vizzo, Tiziano Guadagnino, Benedikt Mersch, Louis Wiesmann, Jens Behley, and Cyrill Stachniss. Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way.IEEE Robotics and Automation Letters (RA-L), 8(2):1029–1036, 2023

  60. [60]

    Towards domain generalization for multi-view 3d object detection in bird-eye- view.Proc

    Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao Chang, Zhen Yang, and Feng Zhao. Towards domain generalization for multi-view 3d object detection in bird-eye- view.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023. 13

  61. [61]

    Train in germany, test in the usa: Making 3d object detectors generalize

    Yan Wang, Xiangyu Chen, Yurong You, Li Erran Li, Bharath Hariharan, Mark Campbell, Kilian Q Weinberger, and Wei-Lun Chao. Train in germany, test in the usa: Making 3d object detectors generalize. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020

  62. [62]

    Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv.org, 2025

    Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Diamond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv.org, 2025

  63. [63]

    Safe, routine, ready: Autonomous driving in five new cities

    Waymo. Safe, routine, ready: Autonomous driving in five new cities. https://waymo.com/blog/ 2025/11/safe-routine-ready-autonomous-driving-in-new-cities/, 2025. Waymo blog post

  64. [64]

    Beginning fully autonomous operations with the 6th-generation Waymo driver

    Waymo. Beginning fully autonomous operations with the 6th-generation Waymo driver. https: //waymo.com/blog/2026/02/ro-on-6th-gen-waymo-driver/, 2026. Waymo blog post

  65. [65]

    Crossing the pond and beyond: Generalizable AI driving for global deployment

    Wayve. Crossing the pond and beyond: Generalizable AI driving for global deployment. https://wayve.ai/thinking/multi-country-generalization/, 2025. Wayve blog post

  66. [66]

    Argoverse 2: Next generation datasets for self-driving perception and forecasting.Advances in Neural Information Processing Systems (NeurIPS), 2021

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.Advances in Neural Information Processing Systems (NeurIPS), 2021

  67. [67]

    Pandaset: Advanced sensor suite dataset for autonomous driving

    Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. InProc. IEEE Conf. on Intelligent Transportation Systems (ITSC). IEEE, 2021

  68. [68]

    Wod-e2e: Waymo open dataset for end-to-end driving in challenging long-tail scenarios.arXiv.org, 2025

    Runsheng Xu, Hubert Lin, Wonseok Jeon, Hao Feng, Yuliang Zou, Liting Sun, John Gorman, Ekaterina Tolstaya, Sarah Tang, Brandyn White, et al. Wod-e2e: Waymo open dataset for end-to-end driving in challenging long-tail scenarios.arXiv.org, 2025

  69. [69]

    Improving traffic signal data quality for the waymo open motion dataset.Transportation Research Part C: Emerging Technologies, 183:105476, 2026

    Xintao Yan, Erdao Liang, Jiawei Wang, Haojie Zhu, and Henry X Liu. Improving traffic signal data quality for the waymo open motion dataset.Transportation Research Part C: Emerging Technologies, 183:105476, 2026

  70. [70]

    Qwen2.5 technical report.arXiv.org, 2024

    An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, et al. Qwen2.5 technical report.arXiv.org, 2024

  71. [71]

    Viser: Imperative, web-based 3d visualization in python.arXiv.org, 2025

    Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Rebecca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al. Viser: Imperative, web-based 3d visualization in python.arXiv.org, 2025

  72. [72]

    Object detection with a unified label space from multiple datasets

    Xiangyun Zhao, Samuel Schulter, Gaurav Sharma, Yi-Hsuan Tsai, Manmohan Chandraker, and Ying Wu. Object detection with a unified label space from multiple datasets. InProc. of the European Conf. on Computer Vision (ECCV), 2020

  73. [73]

    Simple multi-dataset detection

    Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl. Simple multi-dataset detection. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022. 14