4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving
Pith reviewed 2026-05-20 10:02 UTC · model grok-4.3
The pith
Direct velocity measurements from 4D FMCW Lidar improve motion-related perception and planning over geometry alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that point-wise radial velocity measurements supplied by 4D FMCW Lidar act as complementary motion cues that measurably improve dynamic-scene tasks when added to geometric sensing, with the improvement most evident for vulnerable road users and fast-moving objects in the Beijing urban recordings.
What carries the argument
The forward-facing 4D FMCW Lidar that records radial velocity at each point in addition to range and intensity.
If this is right
- Velocity-aware models achieve higher precision on 3D detection of pedestrians and cyclists than geometry-only baselines.
- Motion forecasting and planning modules trained with the velocity channel reduce error in congested traffic and unprotected turns.
- The dataset's persistent track IDs across frames enable consistent evaluation of multi-frame flow and trajectory tasks.
- Multi-Lidar fusion pipelines can incorporate the radial-velocity channel from the 4D sensor to improve surround coverage.
Where Pith is reading between the lines
- The same velocity channel could be tested for robustness gains in adverse weather or nighttime conditions not covered in the Beijing collection.
- Persistent track annotations open the door to longer-horizon trajectory prediction models that build directly on the provided labels.
- Combining the velocity measurements with camera semantics might further reduce false positives on vulnerable road users.
Load-bearing premise
The hybrid auto-labeling plus human refinement process produces sufficiently accurate 3D bounding-box annotations with consistent track IDs, and the chosen Beijing urban scenes are representative of the conditions where velocity cues provide the claimed gains.
What would settle it
A re-run of the motion-forecasting benchmark on a held-out set of fast-moving objects or pedestrians that shows no accuracy gain when velocity channels are added would falsify the complementary-cue claim.
Figures
read the original abstract
We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocity measurements from a forward-facing 4D FMCW Lidar, together with multiple Lidars of different types, including rotating, solid-state, and blind-spot variants, surround-view cameras, and 6-DOF ego-vehicle poses. The dataset was collected in complex urban environments in Beijing and covers dense pedestrian interactions, congested traffic, high-speed driving, and unprotected maneuvers. 4DLidarOpen provides synchronized multi-sensor data and 3D bounding-box annotations with persistent track IDs across five object categories. A hybrid annotation strategy is adopted, where large-scale auto-labeled data support scalable training and human experts refine annotations for the human-annotated training and validation sets. Based on this dataset, we establish benchmarks for 3D object detection, birds-eye view (BEV) segmentation and flow prediction, and motion forecasting with planning. Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding. Compared with geometric-only sensing, the velocity-aware representation improves motion-related perception and downstream forecasting and planning, especially in scenarios involving vulnerable road users and fast-moving objects. These results indicate that 4D FMCW Lidar is a promising sensing modality for motion-aware autonomous driving. The dataset and evaluation toolkit are publicly released to support research on 4D scene understanding, multi-Lidar fusion, and velocity-aware perception and planning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving centered on 4D FMCW Lidar that provides point-wise radial velocity measurements in addition to geometric data. Collected in complex Beijing urban scenes, it includes synchronized data from multiple Lidar types, surround-view cameras, and 6-DOF ego poses, along with 3D bounding-box annotations and persistent track IDs for five object categories generated via a hybrid auto-labeling and human-refinement pipeline. Benchmarks are established for 3D object detection, BEV segmentation and flow prediction, and motion forecasting with planning; experiments indicate that incorporating direct velocity measurements yields complementary motion cues that improve performance on motion-related tasks relative to geometric-only sensing, particularly for vulnerable road users and fast-moving objects.
Significance. If the central results hold after addressing validation gaps, the work is significant as the first public release of an open 4D FMCW Lidar dataset with native velocity data, directly supporting research on velocity-aware perception, multi-Lidar fusion, and motion forecasting in autonomous driving. The public release of the dataset and evaluation toolkit is a clear strength that promotes reproducibility and community follow-on work. The experiments provide initial evidence that velocity measurements offer benefits beyond geometry in dynamic scenes, which could influence sensor selection for future AV systems if the quantitative gains are robustly demonstrated.
major comments (2)
- [§3.2] §3.2 (Annotation Pipeline): The hybrid auto-labeling plus human-refinement process is described at a high level but reports no quantitative metrics on track-ID consistency across frames, label error rates for fast-moving or occluded objects, or inter-annotator agreement. This is load-bearing for the central claim because all motion-forecasting and planning benchmarks rely on accurate persistent track IDs; without these validation statistics, observed gains from adding radial velocity could be confounded by annotation noise or drift rather than true complementary sensing cues.
- [§5] §5 (Experiments and Benchmarks): The abstract and results sections claim that velocity-aware representations improve downstream forecasting and planning, yet provide no specific quantitative deltas, error bars, ablation tables, or statistical significance tests comparing velocity-inclusive versus geometric-only inputs. Concrete numbers from the relevant tables or figures are required to evaluate effect sizes and rule out post-hoc scenario selection effects.
minor comments (2)
- [Abstract] Abstract: The summary of experimental findings would be strengthened by including one or two key quantitative results (e.g., mAP or forecasting error reductions) rather than qualitative statements alone.
- [§2] §2 (Related Work): Explicit side-by-side comparison with prior datasets (KITTI, nuScenes, Waymo) regarding availability of native velocity channels would clarify the novelty of the 4D FMCW contribution.
Simulated Author's Rebuttal
We are grateful to the referee for the thoughtful comments, which have helped us identify areas for improvement in the manuscript. We provide point-by-point responses to the major comments below and indicate the revisions we plan to make.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Annotation Pipeline): The hybrid auto-labeling plus human-refinement process is described at a high level but reports no quantitative metrics on track-ID consistency across frames, label error rates for fast-moving or occluded objects, or inter-annotator agreement. This is load-bearing for the central claim because all motion-forecasting and planning benchmarks rely on accurate persistent track IDs; without these validation statistics, observed gains from adding radial velocity could be confounded by annotation noise or drift rather than true complementary sensing cues.
Authors: We thank the referee for highlighting this important aspect. The annotation pipeline is indeed critical for the validity of the motion-related benchmarks. While the manuscript provides a high-level description, we acknowledge the lack of quantitative metrics. In the revised version, we will include a dedicated subsection in §3.2 reporting track-ID consistency across frames (e.g., percentage of tracks maintained over sequences), estimated label error rates for challenging cases like fast-moving and occluded objects, and inter-annotator agreement scores from the human refinement process. This will help confirm that the observed benefits from velocity measurements are not confounded by annotation issues. revision: yes
-
Referee: [§5] §5 (Experiments and Benchmarks): The abstract and results sections claim that velocity-aware representations improve downstream forecasting and planning, yet provide no specific quantitative deltas, error bars, ablation tables, or statistical significance tests comparing velocity-inclusive versus geometric-only inputs. Concrete numbers from the relevant tables or figures are required to evaluate effect sizes and rule out post-hoc scenario selection effects.
Authors: We agree that the presentation of the experimental results can be strengthened by providing more explicit quantitative comparisons. In the revised manuscript, we will update §5 to include specific numerical deltas between velocity-inclusive and geometric-only models, add error bars where appropriate, present detailed ablation tables, and report statistical significance tests for the observed improvements. This will allow readers to better assess the effect sizes and robustness of the findings. revision: yes
Circularity Check
Dataset release with empirical benchmarks; no circular derivation chain
full rationale
The paper releases raw multi-modal sensor data, 3D bounding-box annotations with track IDs, and runs standard benchmarks for detection, segmentation, flow prediction, and motion forecasting on fixed splits. The central claim—that radial velocity measurements provide complementary cues—is supported by direct experimental comparisons of velocity-aware versus geometric-only inputs on the collected Beijing urban scenes. No equations, fitted parameters, or self-citations are used to define or force the reported improvements; the results are falsifiable against the public dataset and external validation. This is a standard empirical dataset contribution with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Multi-sensor data streams are accurately time-synchronized and spatially calibrated
- domain assumption Hybrid auto-labeling followed by human refinement yields sufficiently accurate 3D bounding boxes and persistent track IDs
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
End-to-end autonomous driving: Challenges and frontiers,
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[2]
A survey on vision-language- action models for autonomous driving,
S. Jiang, Z. Huang, K. Qian, Z. Luo, T. Zhu, Y . Zhong, Y . Tang, M. Kong, Y . Wang, S. Jiaoet al., “A survey on vision-language- action models for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4524–4536
work page 2025
-
[3]
K. Qian, S. Jiang, Y . Zhong, Z. Luo, Z. Huang, T. Zhu, K. Jiang, M. Yang, Z. Fu, J. Miaoet al., “Agentthink: A unified framework for tool-augmented chain-of-thought reasoning in vision-language models for autonomous driving,”arXiv preprint arXiv:2505.15298, vol. 1, no. 2, p. 3, 2025
-
[4]
4d-are: Bridging the attribution gap in llm agent requirements engineering,
B. Yu and L. Zhao, “4d-are: Bridging the attribution gap in llm agent requirements engineering,” 2026. [Online]. Available: https://arxiv.org/abs/2601.04556
-
[5]
Y . Shi, K. Jiang, K. Wang, J. Li, Y . Wang, M. Yang, and D. Yang, “Streamingflow: Streaming occupancy forecasting with asynchronous multi-modal data streams via neural ordinary differential equation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 833–14 842
work page 2024
-
[6]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, S. Zhang, C. Huang, C. Liu, and X. Wang, “Vadv2: End-to-end vectorized autonomous driving via probabilistic planning,”arXiv preprint arXiv:2402.13243, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,
C. Min, D. Zhao, L. Xiao, J. Zhao, X. Xu, Z. Zhu, L. Jin, J. Li, Y . Guo, J. Xinget al., “Driveworld: 4d pre-trained scene understanding via world models for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 15 522–15 533
work page 2024
-
[8]
World4drive: End-to-end autonomous driving via intention-aware physical latent world model,
Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jiaet al., “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 28 632–28 642
work page 2025
-
[9]
Vision meets robotics: The KITTI dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, Sep. 2013
work page 2013
-
[10]
nuscenes: A multi- modal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multi- modal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[11]
Are we ready for autonomous driving? The KITTI vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2012
work page 2012
-
[12]
PointPillars: Fast encoders for object detection from point clouds,
A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
work page 2019
-
[13]
Center-based 3d object detection and tracking,
T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 14
work page 2021
-
[14]
Semantickitti: A dataset for semantic scene understanding of lidar sequences,
J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
work page 2019
-
[15]
Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,”arXiv preprint arXiv:2205.13542, 2022
-
[16]
Learning lane graph representations for motion forecasting,
M. Liang, B. Yang, R. Hu, Y . Chen, R. Liao, S. Feng, and R. Urtasun, “Learning lane graph representations for motion forecasting,” inECCV, 2020
work page 2020
-
[17]
Multi-head attention for multi-modal joint vehicle motion forecasting,
J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and G. P. Gil, “Multi-head attention for multi-modal joint vehicle motion forecasting,” inICRA. IEEE, 2020
work page 2020
-
[18]
The Cityscapes Dataset for Semantic Urban Scene Understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223
work page 2016
-
[19]
TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents
Y . Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha, “Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,” CoRR, vol. abs/1811.02146, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset,” inProceedings of the IEEE/...
work page 2020
-
[21]
Coda: A real-world road corner case dataset for object detection in autonomous driving,
K. Li, K. Chen, H. Wang, L. Hong, C. Ye, J. Han, Y . Chen, W. Zhang, C. Xu, D.-Y . Yeunget al., “Coda: A real-world road corner case dataset for object detection in autonomous driving,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 406–423
work page 2022
-
[22]
Pandaset: Advanced sensor suite dataset for autonomous driving,
P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jianget al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in2021 IEEE international intelligent transportation systems conference (ITSC). IEEE, 2021, pp. 3095–3101
work page 2021
-
[23]
Carla: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16
work page 2017
-
[24]
K. Qian, J. Miao, Z. Luo, Z. Fu, J. Li, Y . Shi, Y . Wang, K. Jiang, M. Yang, and D. Yang, “Lego-motion: Learning-enhanced grids with occupancy instance modeling for class-agnostic motion prediction,” in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 14 178–14 185
work page 2025
-
[25]
A survey of motion planning and control techniques for self-driving urban vehicles,
B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33–55, 2016
work page 2016
-
[26]
End to End Learning for Self-Driving Cars
M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhanget al., “End to end learning for self-driving cars,”arXiv preprint arXiv:1604.07316, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[27]
End-to-end driving via conditional imitation learning,
F. Codevilla, M. M ¨uller, A. L ´opez, V . Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693–4700
work page 2018
-
[28]
End-to-end learning of driving models from large-scale video datasets,
H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182
work page 2017
-
[29]
A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V .-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in2019 international conference on robotics and automation (ICRA). IEEE, 2019, pp. 8248–8254
work page 2019
-
[30]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,
Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2023, pp. 2774–2781
work page 2023
-
[32]
Multi-modal fusion transformer for end-to-end autonomous driving,
A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7077–7087
work page 2021
-
[33]
Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022
work page 2022
-
[34]
Vad: Vectorized scene representation for efficient autonomous driving,
B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, S. Zhang, C. Liu, C. Huang, X. Wanget al., “Vad: Vectorized scene representation for efficient autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 8340– 8350
work page 2023
-
[35]
An lstm network for highway trajectory prediction,
F. Altch ´e and A. de La Fortelle, “An lstm network for highway trajectory prediction,” in2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, 2017, pp. 353–359
work page 2017
-
[36]
Multi-task learning with deep neural networks: A survey,
M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”arXiv preprint arXiv:2009.09796, 2020
-
[37]
Multi-task learning with attention for end-to-end autonomous driving,
K. Ishihara, A. Kanervisto, J. Miura, and V . Hautamaki, “Multi-task learning with attention for end-to-end autonomous driving,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2902–2911
work page 2021
-
[38]
Planning-oriented autonomous driving,
Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862
work page 2023
-
[39]
Vavim and vavam: Autonomous driving through video generative modeling
F. Bartoccioni, E. Ramzi, V . Besnier, S. Venkataramanan, T.-H. Vu, Y . Xu, L. Chambon, S. Gidaris, S. Odabas, D. Hurychet al., “Vavim and vavam: Autonomous driving through video generative modeling,” arXiv preprint arXiv:2502.15672, 2025
-
[40]
Scalable diffusion models with transformers,
W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205
work page 2023
-
[41]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,
B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, and X. Wang, “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,”arXiv preprint arXiv:2411.15139, 2024
-
[42]
Hidden biases of end-to-end driving models,
B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to-end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249
work page 2023
-
[43]
End-to-end interpretable neural motion planner,
W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8660–8669
work page 2019
-
[44]
Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,
A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, “Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,” inComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, 2020, pp. 414–430
work page 2020
-
[45]
Direct preference optimization: Your language model is secretly a reward model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in Neural Information Processing Systems, vol. 36, pp. 53 728–53 741, 2023
work page 2023
-
[46]
LLaMA: Open and Efficient Foundation Language Models
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Llama 2: Open Foundation and Fine-Tuned Chat Models
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
work page 2021
-
[49]
GPT-Driver: Learning to Drive with GPT
J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “Gpt-driver: Learning to drive with gpt,”arXiv preprint arXiv:2310.01415, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
W. Wang, J. Xie, C. Hu, H. Zou, J. Fan, W. Tong, Y . Wen, S. Wu, H. Deng, Z. Liet al., “Drivemlm: Aligning multi-modal large language models with behavioral planning states for autonomous driving,”arXiv preprint arXiv:2312.09245, 2023
-
[51]
Lmdrive: Closed-loop end-to-end driving with large language models,
H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “Lmdrive: Closed-loop end-to-end driving with large language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 120–15 130
work page 2024
-
[52]
A language agent for autonomous driving,
J. Mao, J. Ye, Y . Qian, M. Pavone, and Y . Wang, “A language agent for autonomous driving,”arXiv preprint arXiv:2311.10813, 2023
-
[53]
X. Hou, W. Wang, L. Yang, H. Lin, J. Feng, H. Min, and X. Zhao, “Driveagent: Multi-agent structured reasoning with llm and multimodal sensor fusion for autonomous driving,”arXiv preprint arXiv:2505.02123, 2025. SUBMITTED TO IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 15
-
[54]
Dilu: A knowledge-driven approach to au- tonomous driving with large language models
L. Wen, D. Fu, X. Li, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “Dilu: A knowledge-driven approach to autonomous driving with large language models,”arXiv preprint arXiv:2309.16292, 2023
-
[55]
Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,
K. Jiang, X. Cai, Z. Cui, A. Li, Y . Ren, H. Yu, H. Yang, D. Fu, L. Wen, and P. Cai, “Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models,”IEEE Transactions on Intelligent Vehicles, 2024
work page 2024
-
[56]
Instruct large language models to drive like humans,
R. Zhang, X. Guo, W. Zheng, C. Zhang, K. Keutzer, and L. Chen, “Instruct large language models to drive like humans,”arXiv preprint arXiv:2406.07296, 2024
-
[57]
A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dis- sanayake, N. Ahsan, Y . Li, F. S. Khan, H. Cholakkalet al., “Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,”arXiv preprint arXiv:2503.10621, 2025
-
[58]
Drivemm: All-in-one large multimodal model for autonomous driving,
Z. Huang, C. Feng, F. Yan, B. Xiao, Z. Jie, Y . Zhong, X. Liang, and L. Ma, “Drivemm: All-in-one large multimodal model for autonomous driving,”arXiv preprint arXiv:2412.07689, 2024
-
[59]
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[60]
Otter: A vision-language-action model with text-aware visual feature extraction
H. Huang, F. Liu, L. Fu, T. Wu, M. Mukadam, J. Malik, K. Goldberg, and P. Abbeel, “Otter: A vision-language-action model with text-aware visual feature extraction,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03734
-
[61]
Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma, “Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13757
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
arXiv preprint arXiv:2505.04769 (2025)
R. Sapkota, Y . Cao, K. I. Roumeliotis, and M. Karkee, “Vision-language- action models: Concepts, progress, applications and challenges,”arXiv preprint arXiv:2505.04769, 2025
-
[63]
A Survey on Vision-Language-Action Models for Embodied AI
Y . Ma, Z. Song, Y . Zhuang, J. Hao, and I. King, “A survey on vision-language-action models for embodied ai,”arXiv preprint arXiv:2405.14093, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[65]
Covla: Comprehensive vision-language-action dataset for autonomous driving,
H. Arai, K. Miwa, K. Sasaki, K. Watanabe, Y . Yamaguchi, S. Aoki, and I. Yamamoto, “Covla: Comprehensive vision-language-action dataset for autonomous driving,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1933–1943
work page 2025
-
[66]
Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,
Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 239– 256
work page 2024
-
[67]
Para- drive: Parallelized architecture for real-time autonomous driving,
X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “Para- drive: Parallelized architecture for real-time autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458
work page 2024
-
[68]
Chatmpc: Natural language based mpc personalization,
Y . Miyaoka, M. Inoue, and T. Nii, “Chatmpc: Natural language based mpc personalization,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3598–3603
work page 2024
-
[69]
K. Long, H. Shi, J. Liu, and X. Li, “Vlm-mpc: Vision language foundation model (vlm)-guided model predictive controller (mpc) for autonomous driving,”arXiv preprint arXiv:2408.04821, 2024
-
[70]
Data scaling laws for end-to-end autonomous driving,
A. Naumann, X. Gu, T. Dimlioglu, M. Bojarski, A. Degirmenci, A. Popov, D. Bisla, M. Pavone, U. M ¨uller, and B. Ivanovic, “Data scaling laws for end-to-end autonomous driving,”arXiv preprint arXiv:2504.04338, 2025
-
[71]
Y . Zheng, Z. Xia, Q. Zhang, T. Zhang, B. Lu, X. Huo, C. Han, Y . Li, M. Yu, B. Jinet al., “Preliminary investigation into data scaling laws for imitation learning-based end-to-end autonomous driving,”arXiv preprint arXiv:2412.02689, 2024
-
[72]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631
work page 2020
-
[73]
Scalability in perception for autonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454
work page 2020
-
[74]
Argoverse: 3d tracking and forecasting with rich maps,
M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramananet al., “Argoverse: 3d tracking and forecasting with rich maps,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8748– 8757
work page 2019
-
[75]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[76]
Y . Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,”IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020
work page 2020
-
[77]
M. Jung, W. Yang, D. Lee, H. Gil, G. Kim, and A. Kim, “Helipr: Heterogeneous lidar dataset for inter-lidar place recognition under spa- tiotemporal variations,”The International Journal of Robotics Research, vol. 43, no. 12, pp. 1867–1883, 2024
work page 2024
-
[78]
DICP: Doppler Iterative Closest Point Algorithm,
B. Hexsel, H. Vhavle, and Y . Chen, “DICP: Doppler Iterative Closest Point Algorithm,” inProceedings of Robotics: Science and Systems, New York City, NY , USA, June 2022
work page 2022
-
[79]
Tracking 3d moving objects as centroids using fmcw lidar,
Y . Zeng, Y . Yu, S. Qi, and T. Wu, “Tracking 3d moving objects as centroids using fmcw lidar,” inProceedings of 4th 2024 International Conference on Autonomous Unmanned Systems (4th ICAUS 2024), L. Liu, Y . Niu, W. Fu, and Y . Qu, Eds. Singapore: Springer Nature Singapore, 2025, pp. 536–545
work page 2024
-
[80]
Towards fast correspondence-free odometry using multiple fmcw lidars,
D. J. Yoon, Y . Chen, H. Vhavle, J. Reuther, and T. D. Barfoot, “Towards fast correspondence-free odometry using multiple fmcw lidars,”IEEE Robotics and Automation Letters, vol. 10, no. 9, pp. 9088–9095, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.