4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

Diange Yang; Kai Sun; Kane Qian; Kaojin Zhu; Kun Jiang; Mengmeng Yang; Rujun Yan; Xin Zhao; Yining Shi; Zhengqing Pan

arxiv: 2605.18074 · v1 · pith:62QDNEW6new · submitted 2026-05-18 · 💻 cs.RO

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

Kane Qian , Xin Zhao , Yining Shi , Rujun Yan , Zhengqing Pan , Kaojin Zhu , Mengmeng Yang , Kai Sun

show 2 more authors

Diange Yang Kun Jiang

This is my paper

Pith reviewed 2026-05-20 10:02 UTC · model grok-4.3

classification 💻 cs.RO

keywords 4D FMCW Lidarautonomous drivingradial velocitymotion forecasting3D object detectionBEV segmentationurban datasetmulti-sensor fusion

0 comments

The pith

Direct velocity measurements from 4D FMCW Lidar improve motion-related perception and planning over geometry alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an open multi-modal dataset collected in Beijing that pairs conventional geometric Lidar with a forward-facing 4D FMCW Lidar delivering point-wise radial velocity. Benchmarks on 3D detection, birds-eye-view segmentation, flow prediction, and motion forecasting show that models incorporating the velocity channel outperform geometric-only baselines, with the largest gains appearing around pedestrians and fast-moving vehicles. A hybrid auto-labeling plus human-refinement pipeline supplies large-scale 3D bounding boxes with persistent track IDs across five categories. The work positions the velocity channel as a practical addition that supplies explicit motion cues missing from time-of-flight sensors. Public release of the data and evaluation toolkit is intended to support further research on velocity-aware scene understanding and planning.

Core claim

The central claim is that point-wise radial velocity measurements supplied by 4D FMCW Lidar act as complementary motion cues that measurably improve dynamic-scene tasks when added to geometric sensing, with the improvement most evident for vulnerable road users and fast-moving objects in the Beijing urban recordings.

What carries the argument

The forward-facing 4D FMCW Lidar that records radial velocity at each point in addition to range and intensity.

If this is right

Velocity-aware models achieve higher precision on 3D detection of pedestrians and cyclists than geometry-only baselines.
Motion forecasting and planning modules trained with the velocity channel reduce error in congested traffic and unprotected turns.
The dataset's persistent track IDs across frames enable consistent evaluation of multi-frame flow and trajectory tasks.
Multi-Lidar fusion pipelines can incorporate the radial-velocity channel from the 4D sensor to improve surround coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same velocity channel could be tested for robustness gains in adverse weather or nighttime conditions not covered in the Beijing collection.
Persistent track annotations open the door to longer-horizon trajectory prediction models that build directly on the provided labels.
Combining the velocity measurements with camera semantics might further reduce false positives on vulnerable road users.

Load-bearing premise

The hybrid auto-labeling plus human refinement process produces sufficiently accurate 3D bounding-box annotations with consistent track IDs, and the chosen Beijing urban scenes are representative of the conditions where velocity cues provide the claimed gains.

What would settle it

A re-run of the motion-forecasting benchmark on a held-out set of fast-moving objects or pedestrians that shows no accuracy gain when velocity channels are added would falsify the complementary-cue claim.

Figures

Figures reproduced from arXiv: 2605.18074 by Diange Yang, Kai Sun, Kane Qian, Kaojin Zhu, Kun Jiang, Mengmeng Yang, Rujun Yan, Xin Zhao, Yining Shi, Zhengqing Pan.

**Figure 2.** Figure 2: 4DLidarOpen sensor configuration: five Lidars and five surround [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: 4DLidarOpen data processing pipeline, including raw data collection, sensor synchronization, automatic labeling, human verification, and final dataset [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: 4DLidarOpen sample showing 4D FMCW Lidar data with velocity information: (a) raw point cloud with radial velocity coloring, (b) semantic [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: 4DLidarOpen class and richness statistics. (a) Instance counts across five categories (Car, Van, Cyclist, Pedestrian, Traffic Cone) comparing auto-labeled [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: 4DLidarOpen spatial and density statistics. (a) Object distance distribution showing peak concentration within 50 meters. (b) Distribution of 3D cuboid [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: 4DLidarOpen speed statistics. (a) Category-wise box plots showing speed distributions for Car, Van, Cyclist, Pedestrian, and Traffic Cone. (b) Overall [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: 4DLidarOpen campus ablation experiment. Top row: rolling cone scenario; bottom row: darting pedestrian scenario. (a)-(c) 4D FMCW Lidar results [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: 4DLidarOpen Tianjin crossing test. Top row: pedestrian crossing scenario; bottom row: e-bike crossing scenario. (a) 4D FMCW Lidar + our model; [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocity measurements from a forward-facing 4D FMCW Lidar, together with multiple Lidars of different types, including rotating, solid-state, and blind-spot variants, surround-view cameras, and 6-DOF ego-vehicle poses. The dataset was collected in complex urban environments in Beijing and covers dense pedestrian interactions, congested traffic, high-speed driving, and unprotected maneuvers. 4DLidarOpen provides synchronized multi-sensor data and 3D bounding-box annotations with persistent track IDs across five object categories. A hybrid annotation strategy is adopted, where large-scale auto-labeled data support scalable training and human experts refine annotations for the human-annotated training and validation sets. Based on this dataset, we establish benchmarks for 3D object detection, birds-eye view (BEV) segmentation and flow prediction, and motion forecasting with planning. Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding. Compared with geometric-only sensing, the velocity-aware representation improves motion-related perception and downstream forecasting and planning, especially in scenarios involving vulnerable road users and fast-moving objects. These results indicate that 4D FMCW Lidar is a promising sensing modality for motion-aware autonomous driving. The dataset and evaluation toolkit are publicly released to support research on 4D scene understanding, multi-Lidar fusion, and velocity-aware perception and planning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical dataset release adding native radial velocity from 4D FMCW Lidar in dense urban scenes, with benchmarks that could help motion tasks, though quantitative gains and annotation checks are underdeveloped.

read the letter

The main thing here is a new open dataset that pairs forward-facing 4D FMCW Lidar with point-wise radial velocity, multi-type Lidars, cameras, and 6-DOF poses, all collected in busy Beijing traffic. This combination has not shown up in prior public releases, and the authors add benchmarks for velocity-aware BEV flow and motion forecasting on top of standard detection and segmentation tasks. The data release itself is the core contribution, and making raw sensor streams plus track IDs available should let others test motion cues directly rather than relying on geometric inference alone. The hybrid labeling approach—auto-labeling at scale followed by human refinement on train and val sets—makes sense for producing a usable volume of annotations across five object classes. The intuition that direct velocity helps with pedestrians, cyclists, and fast objects is reasonable and matches what many in the field expect from FMCW sensors. The paper does a clean job documenting the collection setup and releasing an evaluation toolkit. On the softer side, the abstract states that velocity measurements improve downstream tasks but gives no numbers, error bars, or ablation tables, so the size of the gains stays unclear until the full results section is checked. The stress-test concern about track-ID consistency and label accuracy for occluded or high-speed objects lands as a real point to verify; the hybrid pipeline is described but lacks reported metrics on drift, inter-annotator agreement, or error rates for vulnerable road users. If those checks are missing or weak, some of the claimed advantages could trace back to annotation artifacts instead of sensor complementarity. This work is aimed at researchers who need velocity-rich data for perception, forecasting, or planning experiments, especially groups already running multi-Lidar or motion-aware models. Dataset users and benchmark developers will get immediate value from the splits and toolkit. It has enough substance and novelty in the sensing modality to deserve a serious referee, even if revisions will likely focus on adding quantitative details and validation numbers. I would send it to review with targeted requests for those elements rather than desk-rejecting it.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving centered on 4D FMCW Lidar that provides point-wise radial velocity measurements in addition to geometric data. Collected in complex Beijing urban scenes, it includes synchronized data from multiple Lidar types, surround-view cameras, and 6-DOF ego poses, along with 3D bounding-box annotations and persistent track IDs for five object categories generated via a hybrid auto-labeling and human-refinement pipeline. Benchmarks are established for 3D object detection, BEV segmentation and flow prediction, and motion forecasting with planning; experiments indicate that incorporating direct velocity measurements yields complementary motion cues that improve performance on motion-related tasks relative to geometric-only sensing, particularly for vulnerable road users and fast-moving objects.

Significance. If the central results hold after addressing validation gaps, the work is significant as the first public release of an open 4D FMCW Lidar dataset with native velocity data, directly supporting research on velocity-aware perception, multi-Lidar fusion, and motion forecasting in autonomous driving. The public release of the dataset and evaluation toolkit is a clear strength that promotes reproducibility and community follow-on work. The experiments provide initial evidence that velocity measurements offer benefits beyond geometry in dynamic scenes, which could influence sensor selection for future AV systems if the quantitative gains are robustly demonstrated.

major comments (2)

[§3.2] §3.2 (Annotation Pipeline): The hybrid auto-labeling plus human-refinement process is described at a high level but reports no quantitative metrics on track-ID consistency across frames, label error rates for fast-moving or occluded objects, or inter-annotator agreement. This is load-bearing for the central claim because all motion-forecasting and planning benchmarks rely on accurate persistent track IDs; without these validation statistics, observed gains from adding radial velocity could be confounded by annotation noise or drift rather than true complementary sensing cues.
[§5] §5 (Experiments and Benchmarks): The abstract and results sections claim that velocity-aware representations improve downstream forecasting and planning, yet provide no specific quantitative deltas, error bars, ablation tables, or statistical significance tests comparing velocity-inclusive versus geometric-only inputs. Concrete numbers from the relevant tables or figures are required to evaluate effect sizes and rule out post-hoc scenario selection effects.

minor comments (2)

[Abstract] Abstract: The summary of experimental findings would be strengthened by including one or two key quantitative results (e.g., mAP or forecasting error reductions) rather than qualitative statements alone.
[§2] §2 (Related Work): Explicit side-by-side comparison with prior datasets (KITTI, nuScenes, Waymo) regarding availability of native velocity channels would clarify the novelty of the 4D FMCW contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thoughtful comments, which have helped us identify areas for improvement in the manuscript. We provide point-by-point responses to the major comments below and indicate the revisions we plan to make.

read point-by-point responses

Referee: [§3.2] §3.2 (Annotation Pipeline): The hybrid auto-labeling plus human-refinement process is described at a high level but reports no quantitative metrics on track-ID consistency across frames, label error rates for fast-moving or occluded objects, or inter-annotator agreement. This is load-bearing for the central claim because all motion-forecasting and planning benchmarks rely on accurate persistent track IDs; without these validation statistics, observed gains from adding radial velocity could be confounded by annotation noise or drift rather than true complementary sensing cues.

Authors: We thank the referee for highlighting this important aspect. The annotation pipeline is indeed critical for the validity of the motion-related benchmarks. While the manuscript provides a high-level description, we acknowledge the lack of quantitative metrics. In the revised version, we will include a dedicated subsection in §3.2 reporting track-ID consistency across frames (e.g., percentage of tracks maintained over sequences), estimated label error rates for challenging cases like fast-moving and occluded objects, and inter-annotator agreement scores from the human refinement process. This will help confirm that the observed benefits from velocity measurements are not confounded by annotation issues. revision: yes
Referee: [§5] §5 (Experiments and Benchmarks): The abstract and results sections claim that velocity-aware representations improve downstream forecasting and planning, yet provide no specific quantitative deltas, error bars, ablation tables, or statistical significance tests comparing velocity-inclusive versus geometric-only inputs. Concrete numbers from the relevant tables or figures are required to evaluate effect sizes and rule out post-hoc scenario selection effects.

Authors: We agree that the presentation of the experimental results can be strengthened by providing more explicit quantitative comparisons. In the revised manuscript, we will update §5 to include specific numerical deltas between velocity-inclusive and geometric-only models, add error bars where appropriate, present detailed ablation tables, and report statistical significance tests for the observed improvements. This will allow readers to better assess the effect sizes and robustness of the findings. revision: yes

Circularity Check

0 steps flagged

Dataset release with empirical benchmarks; no circular derivation chain

full rationale

The paper releases raw multi-modal sensor data, 3D bounding-box annotations with track IDs, and runs standard benchmarks for detection, segmentation, flow prediction, and motion forecasting on fixed splits. The central claim—that radial velocity measurements provide complementary cues—is supported by direct experimental comparisons of velocity-aware versus geometric-only inputs on the collected Beijing urban scenes. No equations, fitted parameters, or self-citations are used to define or force the reported improvements; the results are falsifiable against the public dataset and external validation. This is a standard empirical dataset contribution with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard assumptions about sensor synchronization and calibration accuracy plus the representativeness of the collected Beijing scenes; no new physical entities or free parameters are introduced beyond typical dataset annotation thresholds.

axioms (2)

domain assumption Multi-sensor data streams are accurately time-synchronized and spatially calibrated
Required for all multi-modal fusion and velocity-aware benchmarks described in the abstract
domain assumption Hybrid auto-labeling followed by human refinement yields sufficiently accurate 3D bounding boxes and persistent track IDs
Central to the claim that the dataset supports reliable training and evaluation of motion-aware tasks

pith-pipeline@v0.9.0 · 5875 in / 1486 out tokens · 40491 ms · 2026-05-20T10:02:54.476054+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 12 internal anchors

[1]

End-to-end autonomous driving: Challenges and frontiers,

L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[2]

A survey on vision-language- action models for autonomous driving,

S. Jiang, Z. Huang, K. Qian, Z. Luo, T. Zhu, Y . Zhong, Y . Tang, M. Kong, Y . Wang, S. Jiaoet al., “A survey on vision-language- action models for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4524–4536

work page 2025
[3]

Agentthink: A unified framework for tool-augmented chain-of-thought reasoning in vision-language models for autonomous driving,

K. Qian, S. Jiang, Y . Zhong, Z. Luo, Z. Huang, T. Zhu, K. Jiang, M. Yang, Z. Fu, J. Miaoet al., “Agentthink: A unified framework for tool-augmented chain-of-thought reasoning in vision-language models for autonomous driving,”arXiv preprint arXiv:2505.15298, vol. 1, no. 2, p. 3, 2025

work page arXiv 2025
[4]

4d-are: Bridging the attribution gap in llm agent requirements engineering,

B. Yu and L. Zhao, “4d-are: Bridging the attribution gap in llm agent requirements engineering,” 2026. [Online]. Available: https://arxiv.org/abs/2601.04556

work page arXiv 2026
[5]

Streamingflow: Streaming occupancy forecasting with asynchronous multi-modal data streams via neural ordinary differential equation,

Y . Shi, K. Jiang, K. Wang, J. Li, Y . Wang, M. Yang, and D. Yang, “Streamingflow: Streaming occupancy forecasting with asynchronous multi-modal data streams via neural ordinary differential equation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 833–14 842

work page 2024
[6]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, S. Zhang, C. Huang, C. Liu, and X. Wang, “Vadv2: End-to-end vectorized autonomous driving via probabilistic planning,”arXiv preprint arXiv:2402.13243, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,

C. Min, D. Zhao, L. Xiao, J. Zhao, X. Xu, Z. Zhu, L. Jin, J. Li, Y . Guo, J. Xinget al., “Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 15 522–15 533

work page 2024
[8]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model,

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jiaet al., “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 28 632–28 642

work page 2025
[9]

Vision meets robotics: The KITTI dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, Sep. 2013

work page 2013
[10]

nuscenes: A multi- modal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multi- modal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020
[11]

Are we ready for autonomous driving? The KITTI vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2012

work page 2012
[12]

PointPillars: Fast encoders for object detection from point clouds,

A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019
[13]

Center-based 3d object detection and tracking,

T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 14

work page 2021
[14]

Semantickitti: A dataset for semantic scene understanding of lidar sequences,

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

work page 2019
[15]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation.arXiv preprint arXiv:2205.13542, 2022

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,”arXiv preprint arXiv:2205.13542, 2022

work page arXiv 2022
[16]

Learning lane graph representations for motion forecasting,

M. Liang, B. Yang, R. Hu, Y . Chen, R. Liao, S. Feng, and R. Urtasun, “Learning lane graph representations for motion forecasting,” inECCV, 2020

work page 2020
[17]

Multi-head attention for multi-modal joint vehicle motion forecasting,

J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and G. P. Gil, “Multi-head attention for multi-modal joint vehicle motion forecasting,” inICRA. IEEE, 2020

work page 2020
[18]

The Cityscapes Dataset for Semantic Urban Scene Understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223

work page 2016
[19]

TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents

Y . Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha, “Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,” CoRR, vol. abs/1811.02146, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,” inProceedings of the IEEE/...

work page 2020
[21]

Coda: A real-world road corner case dataset for object detection in autonomous driving,

K. Li, K. Chen, H. Wang, L. Hong, C. Ye, J. Han, Y . Chen, W. Zhang, C. Xu, D.-Y . Yeunget al., “Coda: A real-world road corner case dataset for object detection in autonomous driving,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 406–423

work page 2022
[22]

Pandaset: Advanced sensor suite dataset for autonomous driving,

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jianget al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in2021 IEEE international intelligent transportation systems conference (ITSC). IEEE, 2021, pp. 3095–3101

work page 2021
[23]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

work page 2017
[24]

Lego-motion: Learning-enhanced grids with occupancy instance modeling for class-agnostic motion prediction,

K. Qian, J. Miao, Z. Luo, Z. Fu, J. Li, Y . Shi, Y . Wang, K. Jiang, M. Yang, and D. Yang, “Lego-motion: Learning-enhanced grids with occupancy instance modeling for class-agnostic motion prediction,” in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 14 178–14 185

work page 2025
[25]

A survey of motion planning and control techniques for self-driving urban vehicles,

B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33–55, 2016

work page 2016
[26]

End to End Learning for Self-Driving Cars

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhanget al., “End to end learning for self-driving cars,”arXiv preprint arXiv:1604.07316, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

End-to-end driving via conditional imitation learning,

F. Codevilla, M. M ¨uller, A. L ´opez, V . Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693–4700

work page 2018
[28]

End-to-end learning of driving models from large-scale video datasets,

H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182

work page 2017
[29]

Learning to drive in a day,

A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V .-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in2019 international conference on robotics and automation (ICRA). IEEE, 2019, pp. 8248–8254

work page 2019
[30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2023, pp. 2774–2781

work page 2023
[32]

Multi-modal fusion transformer for end-to-end autonomous driving,

A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7077–7087

work page 2021
[33]

Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

work page 2022
[34]

Vad: Vectorized scene representation for efficient autonomous driving,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, S. Zhang, C. Liu, C. Huang, X. Wanget al., “Vad: Vectorized scene representation for efficient autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 8340– 8350

work page 2023
[35]

An lstm network for highway trajectory prediction,

F. Altch ´e and A. de La Fortelle, “An lstm network for highway trajectory prediction,” in2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, 2017, pp. 353–359

work page 2017
[36]

Multi-task learning with deep neural networks: A survey,

M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”arXiv preprint arXiv:2009.09796, 2020

work page arXiv 2009
[37]

Multi-task learning with attention for end-to-end autonomous driving,

K. Ishihara, A. Kanervisto, J. Miura, and V . Hautamaki, “Multi-task learning with attention for end-to-end autonomous driving,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2902–2911

work page 2021
[38]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

work page 2023
[39]

Vavim and vavam: Autonomous driving through video generative modeling

F. Bartoccioni, E. Ramzi, V . Besnier, S. Venkataramanan, T.-H. Vu, Y . Xu, L. Chambon, S. Gidaris, S. Odabas, D. Hurychet al., “Vavim and vavam: Autonomous driving through video generative modeling,” arXiv preprint arXiv:2502.15672, 2025

work page arXiv 2025
[40]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

work page 2023
[41]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, and X. Wang, “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,”arXiv preprint arXiv:2411.15139, 2024

work page arXiv 2024
[42]

Hidden biases of end-to-end driving models,

B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to-end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

work page 2023
[43]

End-to-end interpretable neural motion planner,

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8660–8669

work page 2019
[44]

Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,

A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, “Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,” inComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, 2020, pp. 414–430

work page 2020
[45]

Direct preference optimization: Your language model is secretly a reward model,

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in Neural Information Processing Systems, vol. 36, pp. 53 728–53 741, 2023

work page 2023
[46]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[48]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[49]

GPT-Driver: Learning to Drive with GPT

J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “Gpt-driver: Learning to drive with gpt,”arXiv preprint arXiv:2310.01415, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[50]

Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving

W. Wang, J. Xie, C. Hu, H. Zou, J. Fan, W. Tong, Y . Wen, S. Wu, H. Deng, Z. Liet al., “Drivemlm: Aligning multi-modal large language models with behavioral planning states for autonomous driving,”arXiv preprint arXiv:2312.09245, 2023

work page arXiv 2023
[51]

Lmdrive: Closed-loop end-to-end driving with large language models,

H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “Lmdrive: Closed-loop end-to-end driving with large language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 120–15 130

work page 2024
[52]

A language agent for autonomous driving,

J. Mao, J. Ye, Y . Qian, M. Pavone, and Y . Wang, “A language agent for autonomous driving,”arXiv preprint arXiv:2311.10813, 2023

work page arXiv 2023
[53]

Driveagent: Multi-agent structured reasoning with llm and multimodal sensor fusion for autonomous driving,

X. Hou, W. Wang, L. Yang, H. Lin, J. Feng, H. Min, and X. Zhao, “Driveagent: Multi-agent structured reasoning with llm and multimodal sensor fusion for autonomous driving,”arXiv preprint arXiv:2505.02123, 2025. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 15

work page arXiv 2025
[54]

Dilu: A knowledge-driven approach to au- tonomous driving with large language models

L. Wen, D. Fu, X. Li, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “Dilu: A knowledge-driven approach to autonomous driving with large language models,”arXiv preprint arXiv:2309.16292, 2023

work page arXiv 2023
[55]

Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,

K. Jiang, X. Cai, Z. Cui, A. Li, Y . Ren, H. Yu, H. Yang, D. Fu, L. Wen, and P. Cai, “Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024
[56]

Instruct large language models to drive like humans,

R. Zhang, X. Guo, W. Zheng, C. Zhang, K. Keutzer, and L. Chen, “Instruct large language models to drive like humans,”arXiv preprint arXiv:2406.07296, 2024

work page arXiv 2024
[57]

Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,

A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dis- sanayake, N. Ahsan, Y . Li, F. S. Khan, H. Cholakkalet al., “Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,”arXiv preprint arXiv:2503.10621, 2025

work page arXiv 2025
[58]

Drivemm: All-in-one large multimodal model for autonomous driving,

Z. Huang, C. Feng, F. Yan, B. Xiao, Z. Jie, Y . Zhong, X. Liang, and L. Ma, “Drivemm: All-in-one large multimodal model for autonomous driving,”arXiv preprint arXiv:2412.07689, 2024

work page arXiv 2024
[59]

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[60]

Otter: A vision-language-action model with text-aware visual feature extraction

H. Huang, F. Liu, L. Fu, T. Wu, M. Mukadam, J. Malik, K. Goldberg, and P. Abbeel, “Otter: A vision-language-action model with text-aware visual feature extraction,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03734

work page arXiv 2025
[61]

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma, “Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13757

work page internal anchor Pith review Pith/arXiv arXiv 2025
[62]

arXiv preprint arXiv:2505.04769 (2025)

R. Sapkota, Y . Cao, K. I. Roumeliotis, and M. Karkee, “Vision-language- action models: Concepts, progress, applications and challenges,”arXiv preprint arXiv:2505.04769, 2025

work page arXiv 2025
[63]

A Survey on Vision-Language-Action Models for Embodied AI

Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King, “A survey on vision-language-action models for embodied ai,”arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[65]

Covla: Comprehensive vision-language-action dataset for autonomous driving,

H. Arai, K. Miwa, K. Sasaki, K. Watanabe, Y . Yamaguchi, S. Aoki, and I. Yamamoto, “Covla: Comprehensive vision-language-action dataset for autonomous driving,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1933–1943

work page 2025
[66]

Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,

Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 239– 256

work page 2024
[67]

Para- drive: Parallelized architecture for real-time autonomous driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “Para- drive: Parallelized architecture for real-time autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

work page 2024
[68]

Chatmpc: Natural language based mpc personalization,

Y . Miyaoka, M. Inoue, and T. Nii, “Chatmpc: Natural language based mpc personalization,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3598–3603

work page 2024
[69]

Vlm-mpc: Vision language foundation model (vlm)-guided model predictive controller (mpc) for autonomous driving,

K. Long, H. Shi, J. Liu, and X. Li, “Vlm-mpc: Vision language foundation model (vlm)-guided model predictive controller (mpc) for autonomous driving,”arXiv preprint arXiv:2408.04821, 2024

work page arXiv 2024
[70]

Data scaling laws for end-to-end autonomous driving,

A. Naumann, X. Gu, T. Dimlioglu, M. Bojarski, A. Degirmenci, A. Popov, D. Bisla, M. Pavone, U. M ¨uller, and B. Ivanovic, “Data scaling laws for end-to-end autonomous driving,”arXiv preprint arXiv:2504.04338, 2025

work page arXiv 2025
[71]

Data scaling laws for imitation learning-based end-to-end autonomous driving.arXiv preprint arXiv:2412.02689, 2024

Y . Zheng, Z. Xia, Q. Zhang, T. Zhang, B. Lu, X. Huo, C. Han, Y . Li, M. Yu, B. Jinet al., “Preliminary investigation into data scaling laws for imitation learning-based end-to-end autonomous driving,”arXiv preprint arXiv:2412.02689, 2024

work page arXiv 2024
[72]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020
[73]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020
[74]

Argoverse: 3d tracking and forecasting with rich maps,

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramananet al., “Argoverse: 3d tracking and forecasting with rich maps,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8748– 8757

work page 2019
[75]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[76]

Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,

Y . Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,”IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020

work page 2020
[77]

Helipr: Heterogeneous lidar dataset for inter-lidar place recognition under spa- tiotemporal variations,

M. Jung, W. Yang, D. Lee, H. Gil, G. Kim, and A. Kim, “Helipr: Heterogeneous lidar dataset for inter-lidar place recognition under spa- tiotemporal variations,”The International Journal of Robotics Research, vol. 43, no. 12, pp. 1867–1883, 2024

work page 2024
[78]

DICP: Doppler Iterative Closest Point Algorithm,

B. Hexsel, H. Vhavle, and Y . Chen, “DICP: Doppler Iterative Closest Point Algorithm,” inProceedings of Robotics: Science and Systems, New York City, NY , USA, June 2022

work page 2022
[79]

Tracking 3d moving objects as centroids using fmcw lidar,

Y . Zeng, Y . Yu, S. Qi, and T. Wu, “Tracking 3d moving objects as centroids using fmcw lidar,” inProceedings of 4th 2024 International Conference on Autonomous Unmanned Systems (4th ICAUS 2024), L. Liu, Y . Niu, W. Fu, and Y . Qu, Eds. Singapore: Springer Nature Singapore, 2025, pp. 536–545

work page 2024
[80]

Towards fast correspondence-free odometry using multiple fmcw lidars,

D. J. Yoon, Y . Chen, H. Vhavle, J. Reuther, and T. D. Barfoot, “Towards fast correspondence-free odometry using multiple fmcw lidars,”IEEE Robotics and Automation Letters, vol. 10, no. 9, pp. 9088–9095, 2025

work page 2025

Showing first 80 references.

[1] [1]

End-to-end autonomous driving: Challenges and frontiers,

L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[2] [2]

A survey on vision-language- action models for autonomous driving,

S. Jiang, Z. Huang, K. Qian, Z. Luo, T. Zhu, Y . Zhong, Y . Tang, M. Kong, Y . Wang, S. Jiaoet al., “A survey on vision-language- action models for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4524–4536

work page 2025

[3] [3]

Agentthink: A unified framework for tool-augmented chain-of-thought reasoning in vision-language models for autonomous driving,

K. Qian, S. Jiang, Y . Zhong, Z. Luo, Z. Huang, T. Zhu, K. Jiang, M. Yang, Z. Fu, J. Miaoet al., “Agentthink: A unified framework for tool-augmented chain-of-thought reasoning in vision-language models for autonomous driving,”arXiv preprint arXiv:2505.15298, vol. 1, no. 2, p. 3, 2025

work page arXiv 2025

[4] [4]

4d-are: Bridging the attribution gap in llm agent requirements engineering,

B. Yu and L. Zhao, “4d-are: Bridging the attribution gap in llm agent requirements engineering,” 2026. [Online]. Available: https://arxiv.org/abs/2601.04556

work page arXiv 2026

[5] [5]

Streamingflow: Streaming occupancy forecasting with asynchronous multi-modal data streams via neural ordinary differential equation,

Y . Shi, K. Jiang, K. Wang, J. Li, Y . Wang, M. Yang, and D. Yang, “Streamingflow: Streaming occupancy forecasting with asynchronous multi-modal data streams via neural ordinary differential equation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 833–14 842

work page 2024

[6] [6]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, S. Zhang, C. Huang, C. Liu, and X. Wang, “Vadv2: End-to-end vectorized autonomous driving via probabilistic planning,”arXiv preprint arXiv:2402.13243, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,

C. Min, D. Zhao, L. Xiao, J. Zhao, X. Xu, Z. Zhu, L. Jin, J. Li, Y . Guo, J. Xinget al., “Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 15 522–15 533

work page 2024

[8] [8]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model,

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jiaet al., “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 28 632–28 642

work page 2025

[9] [9]

Vision meets robotics: The KITTI dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, Sep. 2013

work page 2013

[10] [10]

nuscenes: A multi- modal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multi- modal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020

[11] [11]

Are we ready for autonomous driving? The KITTI vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2012

work page 2012

[12] [12]

PointPillars: Fast encoders for object detection from point clouds,

A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019

[13] [13]

Center-based 3d object detection and tracking,

T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 14

work page 2021

[14] [14]

Semantickitti: A dataset for semantic scene understanding of lidar sequences,

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

work page 2019

[15] [15]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation.arXiv preprint arXiv:2205.13542, 2022

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,”arXiv preprint arXiv:2205.13542, 2022

work page arXiv 2022

[16] [16]

Learning lane graph representations for motion forecasting,

M. Liang, B. Yang, R. Hu, Y . Chen, R. Liao, S. Feng, and R. Urtasun, “Learning lane graph representations for motion forecasting,” inECCV, 2020

work page 2020

[17] [17]

Multi-head attention for multi-modal joint vehicle motion forecasting,

J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and G. P. Gil, “Multi-head attention for multi-modal joint vehicle motion forecasting,” inICRA. IEEE, 2020

work page 2020

[18] [18]

The Cityscapes Dataset for Semantic Urban Scene Understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223

work page 2016

[19] [19]

TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents

Y . Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha, “Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,” CoRR, vol. abs/1811.02146, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,” inProceedings of the IEEE/...

work page 2020

[21] [21]

Coda: A real-world road corner case dataset for object detection in autonomous driving,

K. Li, K. Chen, H. Wang, L. Hong, C. Ye, J. Han, Y . Chen, W. Zhang, C. Xu, D.-Y . Yeunget al., “Coda: A real-world road corner case dataset for object detection in autonomous driving,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 406–423

work page 2022

[22] [22]

Pandaset: Advanced sensor suite dataset for autonomous driving,

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jianget al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in2021 IEEE international intelligent transportation systems conference (ITSC). IEEE, 2021, pp. 3095–3101

work page 2021

[23] [23]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

work page 2017

[24] [24]

Lego-motion: Learning-enhanced grids with occupancy instance modeling for class-agnostic motion prediction,

K. Qian, J. Miao, Z. Luo, Z. Fu, J. Li, Y . Shi, Y . Wang, K. Jiang, M. Yang, and D. Yang, “Lego-motion: Learning-enhanced grids with occupancy instance modeling for class-agnostic motion prediction,” in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 14 178–14 185

work page 2025

[25] [25]

A survey of motion planning and control techniques for self-driving urban vehicles,

B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33–55, 2016

work page 2016

[26] [26]

End to End Learning for Self-Driving Cars

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhanget al., “End to end learning for self-driving cars,”arXiv preprint arXiv:1604.07316, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

End-to-end driving via conditional imitation learning,

F. Codevilla, M. M ¨uller, A. L ´opez, V . Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693–4700

work page 2018

[28] [28]

End-to-end learning of driving models from large-scale video datasets,

H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182

work page 2017

[29] [29]

Learning to drive in a day,

A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V .-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in2019 international conference on robotics and automation (ICRA). IEEE, 2019, pp. 8248–8254

work page 2019

[30] [30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,

Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2023, pp. 2774–2781

work page 2023

[32] [32]

Multi-modal fusion transformer for end-to-end autonomous driving,

A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7077–7087

work page 2021

[33] [33]

Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

work page 2022

[34] [34]

Vad: Vectorized scene representation for efficient autonomous driving,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, S. Zhang, C. Liu, C. Huang, X. Wanget al., “Vad: Vectorized scene representation for efficient autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 8340– 8350

work page 2023

[35] [35]

An lstm network for highway trajectory prediction,

F. Altch ´e and A. de La Fortelle, “An lstm network for highway trajectory prediction,” in2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, 2017, pp. 353–359

work page 2017

[36] [36]

Multi-task learning with deep neural networks: A survey,

M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”arXiv preprint arXiv:2009.09796, 2020

work page arXiv 2009

[37] [37]

Multi-task learning with attention for end-to-end autonomous driving,

K. Ishihara, A. Kanervisto, J. Miura, and V . Hautamaki, “Multi-task learning with attention for end-to-end autonomous driving,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2902–2911

work page 2021

[38] [38]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

work page 2023

[39] [39]

Vavim and vavam: Autonomous driving through video generative modeling

F. Bartoccioni, E. Ramzi, V . Besnier, S. Venkataramanan, T.-H. Vu, Y . Xu, L. Chambon, S. Gidaris, S. Odabas, D. Hurychet al., “Vavim and vavam: Autonomous driving through video generative modeling,” arXiv preprint arXiv:2502.15672, 2025

work page arXiv 2025

[40] [40]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

work page 2023

[41] [41]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, and X. Wang, “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,”arXiv preprint arXiv:2411.15139, 2024

work page arXiv 2024

[42] [42]

Hidden biases of end-to-end driving models,

B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to-end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

work page 2023

[43] [43]

End-to-end interpretable neural motion planner,

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8660–8669

work page 2019

[44] [44]

Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,

A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, “Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,” inComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, 2020, pp. 414–430

work page 2020

[45] [45]

Direct preference optimization: Your language model is secretly a reward model,

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in Neural Information Processing Systems, vol. 36, pp. 53 728–53 741, 2023

work page 2023

[46] [46]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[48] [48]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[49] [49]

GPT-Driver: Learning to Drive with GPT

J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “Gpt-driver: Learning to drive with gpt,”arXiv preprint arXiv:2310.01415, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[50] [50]

Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving

W. Wang, J. Xie, C. Hu, H. Zou, J. Fan, W. Tong, Y . Wen, S. Wu, H. Deng, Z. Liet al., “Drivemlm: Aligning multi-modal large language models with behavioral planning states for autonomous driving,”arXiv preprint arXiv:2312.09245, 2023

work page arXiv 2023

[51] [51]

Lmdrive: Closed-loop end-to-end driving with large language models,

H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “Lmdrive: Closed-loop end-to-end driving with large language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 120–15 130

work page 2024

[52] [52]

A language agent for autonomous driving,

J. Mao, J. Ye, Y . Qian, M. Pavone, and Y . Wang, “A language agent for autonomous driving,”arXiv preprint arXiv:2311.10813, 2023

work page arXiv 2023

[53] [53]

Driveagent: Multi-agent structured reasoning with llm and multimodal sensor fusion for autonomous driving,

X. Hou, W. Wang, L. Yang, H. Lin, J. Feng, H. Min, and X. Zhao, “Driveagent: Multi-agent structured reasoning with llm and multimodal sensor fusion for autonomous driving,”arXiv preprint arXiv:2505.02123, 2025. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 15

work page arXiv 2025

[54] [54]

Dilu: A knowledge-driven approach to au- tonomous driving with large language models

L. Wen, D. Fu, X. Li, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “Dilu: A knowledge-driven approach to autonomous driving with large language models,”arXiv preprint arXiv:2309.16292, 2023

work page arXiv 2023

[55] [55]

Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,

K. Jiang, X. Cai, Z. Cui, A. Li, Y . Ren, H. Yu, H. Yang, D. Fu, L. Wen, and P. Cai, “Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024

[56] [56]

Instruct large language models to drive like humans,

R. Zhang, X. Guo, W. Zheng, C. Zhang, K. Keutzer, and L. Chen, “Instruct large language models to drive like humans,”arXiv preprint arXiv:2406.07296, 2024

work page arXiv 2024

[57] [57]

Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,

A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dis- sanayake, N. Ahsan, Y . Li, F. S. Khan, H. Cholakkalet al., “Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,”arXiv preprint arXiv:2503.10621, 2025

work page arXiv 2025

[58] [58]

Drivemm: All-in-one large multimodal model for autonomous driving,

Z. Huang, C. Feng, F. Yan, B. Xiao, Z. Jie, Y . Zhong, X. Liang, and L. Ma, “Drivemm: All-in-one large multimodal model for autonomous driving,”arXiv preprint arXiv:2412.07689, 2024

work page arXiv 2024

[59] [59]

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[60] [60]

Otter: A vision-language-action model with text-aware visual feature extraction

H. Huang, F. Liu, L. Fu, T. Wu, M. Mukadam, J. Malik, K. Goldberg, and P. Abbeel, “Otter: A vision-language-action model with text-aware visual feature extraction,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03734

work page arXiv 2025

[61] [61]

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma, “Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13757

work page internal anchor Pith review Pith/arXiv arXiv 2025

[62] [62]

arXiv preprint arXiv:2505.04769 (2025)

R. Sapkota, Y . Cao, K. I. Roumeliotis, and M. Karkee, “Vision-language- action models: Concepts, progress, applications and challenges,”arXiv preprint arXiv:2505.04769, 2025

work page arXiv 2025

[63] [63]

A Survey on Vision-Language-Action Models for Embodied AI

Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King, “A survey on vision-language-action models for embodied ai,”arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[64] [64]

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[65] [65]

Covla: Comprehensive vision-language-action dataset for autonomous driving,

H. Arai, K. Miwa, K. Sasaki, K. Watanabe, Y . Yamaguchi, S. Aoki, and I. Yamamoto, “Covla: Comprehensive vision-language-action dataset for autonomous driving,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1933–1943

work page 2025

[66] [66]

Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,

Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 239– 256

work page 2024

[67] [67]

Para- drive: Parallelized architecture for real-time autonomous driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “Para- drive: Parallelized architecture for real-time autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

work page 2024

[68] [68]

Chatmpc: Natural language based mpc personalization,

Y . Miyaoka, M. Inoue, and T. Nii, “Chatmpc: Natural language based mpc personalization,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3598–3603

work page 2024

[69] [69]

Vlm-mpc: Vision language foundation model (vlm)-guided model predictive controller (mpc) for autonomous driving,

K. Long, H. Shi, J. Liu, and X. Li, “Vlm-mpc: Vision language foundation model (vlm)-guided model predictive controller (mpc) for autonomous driving,”arXiv preprint arXiv:2408.04821, 2024

work page arXiv 2024

[70] [70]

Data scaling laws for end-to-end autonomous driving,

A. Naumann, X. Gu, T. Dimlioglu, M. Bojarski, A. Degirmenci, A. Popov, D. Bisla, M. Pavone, U. M ¨uller, and B. Ivanovic, “Data scaling laws for end-to-end autonomous driving,”arXiv preprint arXiv:2504.04338, 2025

work page arXiv 2025

[71] [71]

Data scaling laws for imitation learning-based end-to-end autonomous driving.arXiv preprint arXiv:2412.02689, 2024

Y . Zheng, Z. Xia, Q. Zhang, T. Zhang, B. Lu, X. Huo, C. Han, Y . Li, M. Yu, B. Jinet al., “Preliminary investigation into data scaling laws for imitation learning-based end-to-end autonomous driving,”arXiv preprint arXiv:2412.02689, 2024

work page arXiv 2024

[72] [72]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020

[73] [73]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020

[74] [74]

Argoverse: 3d tracking and forecasting with rich maps,

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramananet al., “Argoverse: 3d tracking and forecasting with rich maps,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8748– 8757

work page 2019

[75] [75]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[76] [76]

Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,

Y . Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,”IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020

work page 2020

[77] [77]

Helipr: Heterogeneous lidar dataset for inter-lidar place recognition under spa- tiotemporal variations,

M. Jung, W. Yang, D. Lee, H. Gil, G. Kim, and A. Kim, “Helipr: Heterogeneous lidar dataset for inter-lidar place recognition under spa- tiotemporal variations,”The International Journal of Robotics Research, vol. 43, no. 12, pp. 1867–1883, 2024

work page 2024

[78] [78]

DICP: Doppler Iterative Closest Point Algorithm,

B. Hexsel, H. Vhavle, and Y . Chen, “DICP: Doppler Iterative Closest Point Algorithm,” inProceedings of Robotics: Science and Systems, New York City, NY , USA, June 2022

work page 2022

[79] [79]

Tracking 3d moving objects as centroids using fmcw lidar,

Y . Zeng, Y . Yu, S. Qi, and T. Wu, “Tracking 3d moving objects as centroids using fmcw lidar,” inProceedings of 4th 2024 International Conference on Autonomous Unmanned Systems (4th ICAUS 2024), L. Liu, Y . Niu, W. Fu, and Y . Qu, Eds. Singapore: Springer Nature Singapore, 2025, pp. 536–545

work page 2024

[80] [80]

Towards fast correspondence-free odometry using multiple fmcw lidars,

D. J. Yoon, Y . Chen, H. Vhavle, J. Reuther, and T. D. Barfoot, “Towards fast correspondence-free odometry using multiple fmcw lidars,”IEEE Robotics and Automation Letters, vol. 10, no. 9, pp. 9088–9095, 2025

work page 2025