Recognition: 1 theorem link
· Lean Theorem123D: Unifying Multi-Modal Autonomous Driving Data at Scale
Pith reviewed 2026-05-11 01:51 UTC · model grok-4.3
The pith
Treating each sensor reading as an independent timestamped event stream lets one API handle eight incompatible autonomous driving datasets at once.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By representing all modalities as independent timestamped event streams, 123D unifies multi-modal data from fragmented datasets into a single API that supports both synchronous and asynchronous access, enabling the consolidation of over 3,300 hours of real-world driving data and demonstrating applications in cross-dataset detection and reinforcement learning for planning.
What carries the argument
The independent timestamped event stream representation for each modality, which decouples timing from any prescribed rate and permits flexible synchronous or asynchronous querying across arbitrary datasets without custom loaders.
If this is right
- Detectors trained on the combined data can be evaluated for generalization across different collection conditions and annotation conventions.
- Reinforcement learning agents for driving policies gain access to a much larger and more diverse set of experiences drawn from the full 3,300-hour corpus.
- Researchers can perform systematic audits of pose and calibration accuracy that were previously difficult to compare across sources.
- Analysis and visualization tools become available for the entire collection without writing custom code for each original format.
Where Pith is reading between the lines
- The same event-stream abstraction could extend to other robotics domains such as robotic manipulation or aerial navigation where sensor rates also vary widely.
- If adopted as a release format, future datasets could avoid the fragmentation problem from the start by providing data directly in this structure.
- The unified real and synthetic data opens concrete paths for controlled sim-to-real transfer experiments in both perception and planning.
- Large-scale pretraining of driving policies on the full 90,000 km collection becomes feasible in the same way language models pretrain on text corpora.
Load-bearing premise
That storing each modality as an independent timestamped event stream preserves all necessary information and allows accurate synchronization or asynchronous access without introducing errors or losing fidelity from the original datasets' different rates and annotation conventions.
What would settle it
A side-by-side comparison showing that 123D-loaded synchronized frames from two datasets produce object labels or timing offsets that differ from the native loaders of those datasets would falsify the claim of lossless unification.
Figures
read the original abstract
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets. Using 123D, we consolidate eight real-world driving datasets spanning 3,300 hours and 90,000 kilometers, together with a synthetic dataset with configurable collection scripts, and provide tools for data analysis and visualization. We conduct a systematic study comparing annotation statistics and assessing each dataset's pose and calibration accuracy. Further, we showcase two applications 123D enables: cross-dataset 3D object detection transfer and reinforcement learning for planning, and offer recommendations for future directions. Code and documentation are available at https://github.com/kesai-labs/py123d.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents 123D, an open-source framework that unifies multi-modal autonomous driving data from eight real-world datasets (3,300 hours, 90,000 km) and one synthetic dataset via a single API. Each modality is stored as an independent timestamped event stream with no prescribed rate to support synchronous or asynchronous access despite differing original rates, formats, and annotation conventions. The work includes tools for analysis and visualization, a systematic comparison of annotation statistics and pose/calibration accuracy, and two applications: cross-dataset 3D object detection transfer and reinforcement learning for planning.
Significance. If the unification preserves fidelity, 123D would be a substantial contribution by lowering barriers to large-scale cross-dataset training and evaluation in autonomous driving. The open-source release, scale of consolidation, and demonstrated applications add practical value; the systematic accuracy study is a positive step toward reproducibility.
major comments (2)
- [§3] §3 (Data Unification and Event Streams): The central claim that representing every modality as an independent timestamped event stream preserves all necessary information and permits exact reconstruction of original synchronous tuples without interpolation artifacts or loss of fidelity is load-bearing for the cross-dataset applications, yet the manuscript provides no quantitative validation (e.g., timestamp round-trip error, rate-mismatch reconstruction error, or annotation IoU before/after schema mapping) across the eight heterogeneous datasets.
- [§4] §4 (Annotation Statistics and Accuracy Study): The systematic comparison of annotation conventions and pose/calibration accuracy is useful, but the paper does not report how inconsistent 3D box conventions or traffic-light taxonomies were mapped into the common schema, nor any drift metrics; this directly affects the reliability of the cross-dataset 3D detection transfer results shown later.
minor comments (2)
- [§5] The abstract and §5 mention 'configurable collection scripts' for the synthetic dataset, but the manuscript does not specify the exact parameters or randomization ranges used, which would aid reproducibility.
- Figure captions for the visualization tools could more explicitly state which modalities are overlaid in each panel to improve clarity for readers unfamiliar with the original dataset formats.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the practical value of 123D. We address the major comments point by point below, indicating where revisions will be made to improve rigor and transparency.
read point-by-point responses
-
Referee: [§3] §3 (Data Unification and Event Streams): The central claim that representing every modality as an independent timestamped event stream preserves all necessary information and permits exact reconstruction of original synchronous tuples without interpolation artifacts or loss of fidelity is load-bearing for the cross-dataset applications, yet the manuscript provides no quantitative validation (e.g., timestamp round-trip error, rate-mismatch reconstruction error, or annotation IoU before/after schema mapping) across the eight heterogeneous datasets.
Authors: We agree that explicit quantitative validation would strengthen the central claim. The event-stream design stores each modality with its native timestamps and raw content, performing no interpolation, resampling, or data alteration; reconstruction of original tuples is achieved by time-window queries on the independent streams. However, the submitted manuscript indeed lacks the requested metrics. In revision we will add a dedicated validation subsection to §3 that reports timestamp round-trip errors and synchronous reconstruction fidelity on representative subsets of the eight datasets, plus before/after annotation IoU statistics for the schema mappings. revision: partial
-
Referee: [§4] §4 (Annotation Statistics and Accuracy Study): The systematic comparison of annotation conventions and pose/calibration accuracy is useful, but the paper does not report how inconsistent 3D box conventions or traffic-light taxonomies were mapped into the common schema, nor any drift metrics; this directly affects the reliability of the cross-dataset 3D detection transfer results shown later.
Authors: We concur that the mapping procedures and any drift metrics must be documented for reproducibility. The current text describes the target schema but omits the concrete rules used for 3D box coordinate-frame unification, orientation conventions, and traffic-light category harmonization. In the revised §4 we will insert a table and textual description of these mappings together with any available quantitative drift or accuracy metrics drawn from the original dataset releases and our own analysis. This addition will directly support interpretation of the cross-dataset transfer experiments. revision: yes
Circularity Check
No circularity: software framework with no derivations or self-referential reductions
full rationale
The paper describes a data unification framework that stores modalities as independent timestamped event streams to enable cross-dataset access. No mathematical derivations, fitted parameters, predictions, uniqueness theorems, or ansatzes are present. Claims concern the existence of the released tool, its coverage of eight datasets, and two downstream applications; these are externally verifiable via the open-source code and data rather than reducing to self-citations or inputs by construction. The central design choice is presented as an engineering decision, not derived from prior results in a load-bearing way.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multi-modal driving data from heterogeneous datasets can be represented as independent timestamped event streams without loss of synchronization or annotation fidelity.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
High performance i/o for large scale deep learning
Alex Aizman, Gavin Maltby, and Thomas Breuel. High performance i/o for large scale deep learning. In2019 IEEE International Conference on Big Data (Big Data), pages 5965–5967. IEEE, 2019
work page 2019
-
[2]
π0: A vision-language-action flow model for general robot control.arXiv.org, 2024
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language-action flow model for general robot control.arXiv.org, 2024
work page 2024
-
[3]
G. Bradski. The OpenCV Library.Dr . Dobb’s Journal of Software Tools, 2000
work page 2000
-
[4]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...
work page 2020
-
[5]
Lerobot: State-of-the-art machine learning for real-world robotics in pytorch
Remi Cadene, Simon Alibert, Alexander Soare, Quentin Gallouedec, Adil Zouitine, Steven Palma, Pepijn Kooijmans, Michel Aractingi, Mustafa Shukor, Dana Aubakirova, Martino Russi, Francesco Capuano, Caroline Pascal, Jade Choghari, Jess Moss, and Thomas Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch. https://github. com/h...
work page 2024
-
[6]
nuscenes: A multimodal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[7]
Pseudo-simulation for autonomous driving
Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Pseudo-simulation for autonomous driving. InProc. Conf. on Robot Learning (CoRL), 2025
work page 2025
-
[8]
Sam 3: Segment anything with concepts.Proc
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.Proc. of the International Conf. on Learning Representations (ICLR), 2026
work page 2026
-
[9]
Gyusam Chang, Jiwon Lee, Donghyun Kim, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sujin Jang, and Sangpil Kim. Unified domain generalization and adaptation for multi-view 3d object detection.Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[10]
Argoverse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[11]
Olmix: A framework for data mixing throughout lm development.arXiv.org, 2026
Mayee F Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi, Christo- pher Ré, Luca Soldaini, and Kyle Lo. Olmix: A framework for data mixing throughout lm development.arXiv.org, 2026. 10
work page 2026
-
[12]
Omnire: Omni urban scene reconstruction.Proc
Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, et al. Omnire: Omni urban scene reconstruction.Proc. of the International Conf. on Learning Representations (ICLR), 2025
work page 2025
-
[13]
Masked-attention mask transformer for universal image segmentation
Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[14]
Open x-embodiment: Robotic learning datasets and rt-x models
OX-Embodiment Collaboration, Abby O’Neill, Abdul Rehman, Abhinav Gupta, Abhiram Mad- dukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, et al. Open x-embodiment: Robotic learning datasets and rt-x models. InProc. IEEE International Conf. on Robotics and Automation (ICRA), 2023
work page 2023
-
[15]
MMDetection3D: OpenMMLab next-generation platform for general 3D object detection
MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020
work page 2020
-
[16]
PufferDrive: A fast and friendly driving simulator for training and evaluating RL agents, 2026
Daphne Cornelisse*, Spencer Cheng*, Pragnay Mandavilli, Julian Hunt, Kevin Joseph, Waël Doulazmi, Valentin Charraut, Aditya Gupta, Joseph Suarez, and Eugene Vinitsky. PufferDrive: A fast and friendly driving simulator for training and evaluating RL agents, 2026. URL https://github.com/Emerge-Lab/PufferDrive
work page 2026
-
[17]
Robust autonomy emerges from self-play.Proc
Marco Cusumano-Towner, David Hafner, Alex Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, et al. Robust autonomy emerges from self-play.Proc. of the International Conf. on Machine learning (ICML), 2025
work page 2025
-
[18]
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[19]
Refav: Towards planning-centric scenario mining
Cainan Davidson, Deva Ramanan, and Neehar Peri. Refav: Towards planning-centric scenario mining. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[20]
CARLA: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InProc. Conf. on Robot Learning (CoRL), 2017
work page 2017
-
[21]
Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset
Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. InProc. of the IEEE International Conf. on Computer Vision (ICCV), 2021
work page 2021
-
[22]
Unitraj: A unified framework for scalable vehicle trajectory prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, and Alexandre Alahi. Unitraj: A unified framework for scalable vehicle trajectory prediction. InProc. of the European Conf. on Computer Vision (ECCV), 2024
work page 2024
- [23]
-
[24]
The Apache Software Foundation. Apache arrow. https://github.com/apache/arrow, 2026
work page 2026
-
[25]
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2012
work page 2012
-
[26]
The llama 3 herd of models.arXiv.org, 2024
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv.org, 2024
work page 2024
-
[27]
Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, et al. Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research.Advances in Neural Information Processing Systems (NeurIPS), 2023. 11
work page 2023
-
[28]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[29]
One thousand and one hours: Self-driving motion prediction dataset
John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, and Peter Ondruska. One thousand and one hours: Self-driving motion prediction dataset. InProc. Conf. on Robot Learning (CoRL), 2021
work page 2021
-
[30]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[31]
Open Geospatial Consortium Inc. Opengis® implementation standard for geographic informa- tion - simple feature access - part 1: Common architecture. https://www.ogc.org/standards/sfa, 2011
work page 2011
-
[32]
ISO 8855:2011(en) Road vehicles — Vehicle dynamics and road-holding ability — V ocabulary
International Organization for Standardization. ISO 8855:2011(en) Road vehicles — Vehicle dynamics and road-holding ability — V ocabulary. https://www.iso.org/obp/ui/en/#iso:std:iso: 8855:, 2011
work page 2011
-
[33]
trajdata: A unified interface to multiple human trajectory datasets
Boris Ivanovic, Guanyu Song, Igor Gilitschenski, and Marco Pavone. trajdata: A unified interface to multiple human trajectory datasets. InAdvances in Neural Information Processing Systems (NIPS), 2023
work page 2023
-
[34]
Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[35]
Towards learning-based planning: The nuplan benchmark for real-world autonomous driving
Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, et al. Towards learning-based planning: The nuplan benchmark for real-world autonomous driving. InProc. IEEE International Conf. on Robotics and Automation (ICRA), 2024
work page 2024
-
[36]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023
work page 2023
-
[37]
Coin3d: Revisiting configuration-invariant multi-camera 3d object detection.Proc
Zhaonian Kuang, Rui Ding, Haotian Wang, Xinhu Zheng, Meng Yang, and Gang Hua. Coin3d: Revisiting configuration-invariant multi-camera 3d object detection.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[38]
Terraseg: Self- supervised ground segmentation for any lidar.Proc
Ted Lentsch, Santiago Montiel-Marín, Holger Caesar, and Dariu M Gavrila. Terraseg: Self- supervised ground segmentation for any lidar.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[39]
Str: A simple and efficient algorithm for r-tree packing
Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. Str: A simple and efficient algorithm for r-tree packing. InProceedings 13th international conference on data engineering, pages 497–506. IEEE, 1997
work page 1997
-
[40]
Datasets: A community library for natural language processing
Quentin Lhoest, Albert Villanova Del Moral, Yacine Jernite, Abhishek Thakur, Patrick V on Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al. Datasets: A community library for natural language processing. InConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
work page 2021
-
[41]
Quanyi Li, Zhenghao Mark Peng, Lan Feng, Zhizheng Liu, Chenda Duan, Wenjie Mo, and Bolei Zhou. Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling.Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[42]
Mtgs: Multi-traversal gaussian splatting.arXiv.org, 2025
Tianyu Li, Yihang Qiu, Zhenhua Wu, Carl Lindström, Peng Su, Matthias Nießner, and Hongyang Li. Mtgs: Multi-traversal gaussian splatting.arXiv.org, 2025
work page 2025
-
[43]
Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Jing Yang, Yeqiang Qian, Chunxiang Wang, and Ming Yang. Tactics2d: A highly modular and extensible simulator for driving decision-making.IEEE Transactions on Intelligent V ehicles (T-IV), 2024. 12
work page 2024
-
[44]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.Proc. of the European Conf. on Computer Vision (ECCV), 2022
work page 2022
-
[45]
Is ego status all you need for open-loop end-to-end autonomous driving? InProc
Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, and Jose M Alvarez. Is ego status all you need for open-loop end-to-end autonomous driving? InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[46]
Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Trans
Yiyi Liao, Jun Xie, and Andreas Geiger. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2022
work page 2022
-
[47]
Depth anything 3: Recovering the visual space from any views.arXiv.org, 2025
Haotong Lin, Sili Chen, Junhao Liew, Donny Y Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv.org, 2025
work page 2025
-
[48]
Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, and Alois C Knoll. A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent V ehicles (T-IV), 2024
work page 2024
-
[49]
Petr: Position embedding transfor- mation for multi-view 3d object detection
Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InProc. of the European Conf. on Computer Vision (ECCV), 2022
work page 2022
-
[50]
Lead: Minimizing learner-expert asymmetry in end-to-end driving.Proc
Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, and Kashyap Chitta. Lead: Minimizing learner-expert asymmetry in end-to-end driving.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[51]
PhysicalAI-Autonomous-Vehicles
NVIDIA. PhysicalAI-Autonomous-Vehicles. https://huggingface.co/datasets/nvidia/ PhysicalAI-Autonomous-Vehicles, 2025. Hugging Face dataset
work page 2025
-
[52]
NVIDIA DRIVE Hyperion: L4-Ready autonomous vehicle platform
NVIDIA. NVIDIA DRIVE Hyperion: L4-Ready autonomous vehicle platform. https://www. nvidia.com/en-us/solutions/autonomous-vehicles/drive-hyperion/, 2026. NVIDIA product page
work page 2026
-
[53]
PhysicalAI-Autonomous-Vehicles-NCore
NVIDIA. PhysicalAI-Autonomous-Vehicles-NCore. https://huggingface.co/datasets/nvidia/ PhysicalAI-Autonomous-Vehicles-NCore, 2026. Hugging Face dataset
work page 2026
-
[54]
Fastgs: Training 3d gaussian splatting in 100 seconds.Proc
Shiwei Ren, Tianci Wen, Yongchun Fang, and Biao Lu. Fastgs: Training 3d gaussian splatting in 100 seconds.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[55]
Drivelm: Driving with graph visual question answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InProc. of the European Conf. on Computer Vision (ECCV), 2024
work page 2024
-
[56]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[57]
Openpcdet: An open-source toolbox for 3d object detection from point clouds
OpenPCDet Development Team. Openpcdet: An open-source toolbox for 3d object detection from point clouds. https://github.com/open-mmlab/OpenPCDet, 2020
work page 2020
-
[58]
Neurad: Neural rendering for autonomous driving
Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. Neurad: Neural rendering for autonomous driving. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[59]
Ignacio Vizzo, Tiziano Guadagnino, Benedikt Mersch, Louis Wiesmann, Jens Behley, and Cyrill Stachniss. Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way.IEEE Robotics and Automation Letters (RA-L), 8(2):1029–1036, 2023
work page 2023
-
[60]
Towards domain generalization for multi-view 3d object detection in bird-eye- view.Proc
Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao Chang, Zhen Yang, and Feng Zhao. Towards domain generalization for multi-view 3d object detection in bird-eye- view.Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023. 13
work page 2023
-
[61]
Train in germany, test in the usa: Making 3d object detectors generalize
Yan Wang, Xiangyu Chen, Yurong You, Li Erran Li, Bharath Hariharan, Mark Campbell, Kilian Q Weinberger, and Wei-Lun Chao. Train in germany, test in the usa: Making 3d object detectors generalize. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[62]
Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Diamond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv.org, 2025
work page 2025
-
[63]
Safe, routine, ready: Autonomous driving in five new cities
Waymo. Safe, routine, ready: Autonomous driving in five new cities. https://waymo.com/blog/ 2025/11/safe-routine-ready-autonomous-driving-in-new-cities/, 2025. Waymo blog post
work page 2025
-
[64]
Beginning fully autonomous operations with the 6th-generation Waymo driver
Waymo. Beginning fully autonomous operations with the 6th-generation Waymo driver. https: //waymo.com/blog/2026/02/ro-on-6th-gen-waymo-driver/, 2026. Waymo blog post
work page 2026
-
[65]
Crossing the pond and beyond: Generalizable AI driving for global deployment
Wayve. Crossing the pond and beyond: Generalizable AI driving for global deployment. https://wayve.ai/thinking/multi-country-generalization/, 2025. Wayve blog post
work page 2025
-
[66]
Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.Advances in Neural Information Processing Systems (NeurIPS), 2021
work page 2021
-
[67]
Pandaset: Advanced sensor suite dataset for autonomous driving
Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. InProc. IEEE Conf. on Intelligent Transportation Systems (ITSC). IEEE, 2021
work page 2021
-
[68]
Runsheng Xu, Hubert Lin, Wonseok Jeon, Hao Feng, Yuliang Zou, Liting Sun, John Gorman, Ekaterina Tolstaya, Sarah Tang, Brandyn White, et al. Wod-e2e: Waymo open dataset for end-to-end driving in challenging long-tail scenarios.arXiv.org, 2025
work page 2025
-
[69]
Xintao Yan, Erdao Liang, Jiawei Wang, Haojie Zhu, and Henry X Liu. Improving traffic signal data quality for the waymo open motion dataset.Transportation Research Part C: Emerging Technologies, 183:105476, 2026
work page 2026
-
[70]
Qwen2.5 technical report.arXiv.org, 2024
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, et al. Qwen2.5 technical report.arXiv.org, 2024
work page 2024
-
[71]
Viser: Imperative, web-based 3d visualization in python.arXiv.org, 2025
Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Rebecca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al. Viser: Imperative, web-based 3d visualization in python.arXiv.org, 2025
work page 2025
-
[72]
Object detection with a unified label space from multiple datasets
Xiangyun Zhao, Samuel Schulter, Gaurav Sharma, Yi-Hsuan Tsai, Manmohan Chandraker, and Ying Wu. Object detection with a unified label space from multiple datasets. InProc. of the European Conf. on Computer Vision (ECCV), 2020
work page 2020
-
[73]
Simple multi-dataset detection
Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl. Simple multi-dataset detection. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022. 14
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.