VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
Pith reviewed 2026-05-24 03:13 UTC · model grok-4.3
The pith
VADv2 models end-to-end driving planning as a probabilistic distribution over a tokenized action vocabulary instead of deterministic regression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VADv2 is a probabilistic planning model for end-to-end autonomous driving that first discretizes the high-dimensional continuous spatiotemporal action space into a large planning vocabulary, tokenizes the vocabulary into planning tokens, lets these tokens interact with scene tokens to output the probabilistic distribution of actions, and supervises the distribution with mass driving demonstrations.
What carries the argument
Probabilistic field function that maps from the tokenized planning vocabulary to an action distribution after interaction with scene tokens.
If this is right
- Achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark.
- Leads the Bench2Drive benchmark.
- Demonstrates effectiveness on NAVSIM and a large-scale 3DGS-based benchmark for real-world applications.
- Better copes with planning uncertainty than existing deterministic methods.
Where Pith is reading between the lines
- Sampling from the learned distribution could support safer multi-hypothesis planning in ambiguous situations.
- The vocabulary approach might allow straightforward scaling to capture rarer driving maneuvers without changing the model architecture.
- The token interaction mechanism could extend naturally to modeling interactions among multiple agents.
Load-bearing premise
Discretizing the continuous spatiotemporal planning action space into a finite vocabulary preserves enough information to model accurate human-like driving behavior under uncertainty.
What would settle it
A closed-loop benchmark run showing VADv2 fails to outperform deterministic regression baselines on scenarios with high stochasticity such as dense unpredictable traffic.
Figures
read the original abstract
Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. Existing learning-based planning methods follow a deterministic paradigm to directly regress the action, failing to cope with the uncertainty problem. In this work, we propose a probabilistic planning model for end-to-end autonomous driving, termed VADv2. We resort to a probabilistic field function to model the mapping from the action space to the probabilistic distribution. Since the planning action space is a high-dimensional continuous spatiotemporal space and hard to tackle, we first discretize the planning action space to a large planning vocabulary and then tokenize the planning vocabulary into planning tokens. Planning tokens interact with scene tokens and output the probabilistic distribution of action. Mass driving demonstrations are leveraged to supervise the distribution. VADv2 achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming existing methods, and also leads the recent Bench2Drive benchmark. We further provide comprehensive evaluations on NAVSIM and a large-scale 3DGS-based benchmark, demonstrating its effectiveness in real-world applications. Code is available at https://github.com/hustvl/VAD.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes VADv2, a probabilistic planning model for end-to-end autonomous driving. It models the mapping from action space to probability distribution via a probabilistic field function. Because the planning action space is high-dimensional and continuous, the method first discretizes it into a large planning vocabulary, tokenizes the vocabulary into planning tokens, and lets these tokens interact with scene tokens to produce the action distribution. Large-scale driving demonstrations supervise the learned distribution. The abstract claims state-of-the-art closed-loop performance on CARLA Town05 (significantly outperforming prior methods) and leadership on the Bench2Drive benchmark, with further results on NAVSIM and a 3DGS-based benchmark. Code is stated to be available.
Significance. If the central claims hold after full verification, the work would be significant for shifting end-to-end driving from deterministic regression to explicit probabilistic modeling of planning uncertainty. The vectorized tokenization approach and use of large demonstration data could improve robustness in non-deterministic scenarios. The public code release is a positive factor for reproducibility.
major comments (1)
- Abstract: The discretization of the continuous spatiotemporal planning action space into a finite planning vocabulary is presented as essential for probabilistic modeling, yet no vocabulary size, discretization criteria, resolution analysis, or evidence that information loss is negligible is supplied. This step is load-bearing for the claim that the resulting distribution faithfully captures human-like behavior under uncertainty; without these details the SOTA closed-loop performance assertions on CARLA Town05 and Bench2Drive cannot be evaluated.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the opportunity to clarify our work. We address the single major comment below.
read point-by-point responses
-
Referee: Abstract: The discretization of the continuous spatiotemporal planning action space into a finite planning vocabulary is presented as essential for probabilistic modeling, yet no vocabulary size, discretization criteria, resolution analysis, or evidence that information loss is negligible is supplied. This step is load-bearing for the claim that the resulting distribution faithfully captures human-like behavior under uncertainty; without these details the SOTA closed-loop performance assertions on CARLA Town05 and Bench2Drive cannot be evaluated.
Authors: We agree that the abstract is a concise summary and therefore omits the concrete vocabulary size, discretization criteria, resolution analysis, and any explicit quantification of information loss. These elements are load-bearing for the probabilistic modeling claim. The full manuscript provides the vocabulary size, the discretization procedure (including spatiotemporal resolution), and supporting analysis in the methods section. To directly address the concern and improve evaluability of the SOTA claims, we will revise the abstract to state the vocabulary size and briefly note the discretization criteria and resolution. This change will be made in the next version. revision: yes
Circularity Check
No circularity: abstract describes method without equations, self-citations, or self-referential fitting
full rationale
The provided abstract outlines a probabilistic planning model that discretizes the continuous action space into a vocabulary, tokenizes it, and supervises the resulting distribution using external mass driving demonstrations. No equations, derivation steps, or load-bearing self-citations appear in the text. Performance claims reference external benchmarks (CARLA Town05, Bench2Drive) rather than internal definitions or fitted inputs renamed as predictions. The approach is presented as a direct engineering solution to uncertainty, with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 47 Pith papers
-
4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving
4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segm...
-
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
-
SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving
SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.
-
Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving
PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.
-
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
AlphaDrive uses GRPO-based RL rewards and two-stage SFT+RL training on VLMs to improve autonomous driving planning performance and efficiency while producing emergent multimodal capabilities.
-
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives
BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.
-
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
-
Driving Intents Amplify Planning-Oriented Reinforcement Learning
DIAL uses intent-conditioned CFG and multi-intent GRPO to expand and preserve diverse modes in continuous-action preference RL, lifting RFS to 9.14 and surpassing both prior best (8.5) and human demonstration (8.13).
-
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.
-
The DAWN of World-Action Interactive Models
DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.
-
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
-
ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution
ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.
-
Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset
Creates LTD dataset for open-ended traffic VQA and trains UniVLT model to achieve SOTA on unified microscopic AD and macroscopic traffic reasoning tasks.
-
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
-
FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving
FeaXDrive improves end-to-end autonomous driving by shifting diffusion planning to a trajectory-centric formulation with curvature-constrained training, drivable-area guidance, and GRPO post-training, yielding stronge...
-
SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving
Multi-ORFT improves closed-loop multi-agent driving planners by coupling scene-consistent diffusion pre-training with stable online RL post-training, reducing collisions and off-road rates while increasing speed on th...
-
Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
MOSAIC is a scaling-aware data selection framework that outperforms baselines in training end-to-end autonomous driving planners, achieving comparable or better EPDMS scores with up to 80% less data.
-
Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models
Orion-Lite uses latent feature distillation and trajectory supervision to create a vision-only model that surpasses its LLM-based teacher on closed-loop Bench2Drive evaluation, achieving a new SOTA driving score of 80.6.
-
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to pla...
-
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World
DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
-
Pseudo-Expert Regularized Offline RL for End-to-End Autonomous Driving in Photorealistic Closed-Loop Environments
Pseudo-expert regularized offline RL reduces collisions and improves route completion for camera-based driving models trained on fixed simulator datasets from nuScenes.
-
SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
SpaceDrive integrates 3D positional encodings derived from depth and ego-states into VLMs, replacing digit tokens to improve spatial reasoning and trajectory regression in autonomous driving.
-
SimScale: Learning to Drive via Real-World Simulation at Scale
SimScale synthesizes unseen driving states from real logs via neural rendering and reactive environments, generates pseudo-expert trajectories, and shows that co-training on real plus simulated data improves planning ...
-
CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving
CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.
-
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
PRIX presents an efficient camera-only planner with a novel CaRT module that matches larger multimodal models on NavSim and nuScenes while reducing model size and inference time.
-
Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving
EnDfuser replaces point-estimate trajectory planning with ensemble diffusion in a single attention-pooling transformer module to model posterior trajectory uncertainty and improve safety in end-to-end autonomous driving.
-
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
FSDrive uses a generated future scene frame as visual spatio-temporal CoT to improve VLA models for safer autonomous driving trajectory prediction.
-
VERDI: VLM-Embedded Reasoning for Autonomous Driving
VERDI aligns perception, prediction, and planning outputs of end-to-end AD models with VLM-generated text features at training time to embed structured reasoning, yielding up to 11% better l2 distance and 10% higher n...
-
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ORION reports 77.74 Driving Score and 54.62% Success Rate on Bench2Drive, outperforming prior end-to-end methods by 14.28 DS and 19.61% SR through unified VQA and planning optimization.
-
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks
Uni-NaVid unifies diverse embodied navigation tasks into one video-based vision-language-action model trained on 3.6 million samples from four sub-tasks, achieving state-of-the-art performance on benchmarks and real-w...
-
EMMA: End-to-End Multimodal Model for Autonomous Driving
EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.
-
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Senna decouples language-based high-level planning from an LVLM with low-level trajectory prediction from an E2E model, reporting 27% lower planning error and 33% lower collisions after pre-training on DriveX and fine...
-
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Hydra-MDP uses multi-teacher distillation and a multi-head decoder to learn diverse, metric-specific trajectories in an end-to-end autonomous-driving planner, winning the Navsim challenge.
-
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving
SafeAlign-VLA uses counterfactual safety pairing and anchor-based group relative policy optimization to incorporate negative data for safer VLA-based autonomous driving.
-
DriveSafer: End-to-End Autonomous Driving with Safety Guidance
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-tim...
-
EponaV2: Driving World Model with Comprehensive Future Reasoning
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
-
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.
-
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.
-
Driving Intents Amplify Planning-Oriented Reinforcement Learning
DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.
-
REAP: Reinforcement-Learning End-to-End Autonomous Parking with Gaussian Splatting Simulator for Real2Sim2Real Transfer
REAP trains an end-to-end SAC policy with behavior cloning and collision penalties inside a 3DGS Real2Sim simulator and transfers it to physical vehicles, succeeding in narrow mechanical parking slots.
-
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
-
CrowdVLA: Embodied Vision-Language-Action Agents for Context-Aware Crowd Simulation
CrowdVLA introduces vision-language-action agents for crowd simulation that reason about scene semantics, social norms, and action consequences using fine-tuned models and simulation rollouts.
-
DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving
DynFlowDrive models action-conditioned scene transitions via rectified flow in latent space and adds stability-aware trajectory selection, showing gains on nuScenes and NavSim without added inference cost.
-
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
DIVER uses RL-guided diffusion to produce diverse feasible trajectories from one ground-truth path, addressing mode collapse in imitation learning for autonomous driving.
-
FocalAD: Local Motion Planning for End-to-End Autonomous Driving
FocalAD adds an ego-local graph interactor and focal loss to prioritize decision-critical neighbors, yielding lower collision rates than prior methods on nuScenes, Bench2Drive, and especially the Adv-nuScenes robustness set.
-
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
DeepSight uses parallel latent feature prediction in BEV for long-horizon world modeling and adaptive text reasoning to reach state-of-the-art closed-loop performance on the Bench2drive benchmark.
-
Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive
Cross-benchmark analysis of 8 methods shows NAVSIM PDM Score correlates with Bench2Drive Driving Score at Spearman ρ=0.90, with Ego Progress as the strongest single predictor and a simpler 3-metric formula matching th...
Reference graph
Works this paper leans on
-
[1]
Lan- guage models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners. Advances in neural in- formation processing systems, 33:1877–1901, 2020. 3
work page 1901
-
[2]
Mp3: A unified model to map, perceive, predict and plan
Sergio Casas, Abbas Sadat, and Raquel Urtasun. Mp3: A unified model to map, perceive, predict and plan. In CVPR,
-
[3]
Multipath: Multiple probabilistic anchor tra- jectory hypotheses for behavior prediction
Yuning Chai, Benjamin Sapp, Mayank Bansal, and Dragomir Anguelov. Multipath: Multiple probabilistic anchor tra- jectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019. 3
-
[4]
Gri: General reinforced imitation and its application to vision-based autonomous driving
Raphael Chekroun, Marin Toromanoff, Sascha Hornauer, and Fabien Moutarde. Gri: General reinforced imitation and its application to vision-based autonomous driving. arXiv preprint arXiv:2111.08575, 2021. 3
-
[5]
Learn- ing to drive from a world on rails
Dian Chen, Vladlen Koltun, and Philipp Krähenbühl. Learn- ing to drive from a world on rails. In ICCV, 2021. 3
work page 2021
-
[6]
Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. 2020. 6
work page 2020
-
[7]
Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, and Jamie Shotton. Driving with llms: Fusing object-level vector modality for explainable autonomous driving.arXiv preprint arXiv:2310.01957, 2023. 3
-
[8]
Exploring the limitations of behavior cloning for autonomous driving
Felipe Codevilla, Eder Santana, Antonio M López, and Adrien Gaidon. Exploring the limitations of behavior cloning for autonomous driving. In ICCV, 2019. 3
work page 2019
-
[9]
Exploring the limitations of behavior cloning for autonomous driving
Felipe Codevilla, Eder Santana, Antonio M López, and Adrien Gaidon. Exploring the limitations of behavior cloning for autonomous driving. 2019. 6
work page 2019
-
[10]
Xinpeng Ding, Jianhua Han, Hang Xu, Wei Zhang, and Xi- aomeng Li. Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driv- ing. arXiv preprint arXiv:2309.05186, 2023. 3 t=0 t=3 t=0 t=3 t=0 t=3 t=0 t=3 Figure 3. Qualitative results of V ADv2
-
[11]
Carla: An open urban driv- ing simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. In Conference on robot learning, pages 1–16. PMLR, 2017. 5
work page 2017
-
[12]
Drive like a human: Rethink- ing autonomous driving with large language models
Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, and Yu Qiao. Drive like a human: Rethink- ing autonomous driving with large language models. arXiv preprint arXiv:2307.07162, 2023. 3
-
[13]
Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation. In CVPR, 2020. 3
work page 2020
-
[14]
Vip3d: End-to- end visual trajectory prediction via 3d agent queries
Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, and Hang Zhao. Vip3d: End-to- end visual trajectory prediction via 3d agent queries. arXiv preprint arXiv:2208.01582, 2022. 3
-
[15]
Openstreetmap: User- generated street maps
Mordechai Haklay and Patrick Weber. Openstreetmap: User- generated street maps. IEEE Pervasive computing, 2008. 6
work page 2008
-
[16]
Model-based imitation learning for urban driving
Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zak Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, and Jamie Shotton. Model-based imitation learning for urban driving. In Advances in Neural Information Pro- cessing Systems (NeurIPS), 2022. 1, 6
work page 2022
-
[17]
Fiery: Future instance prediction in bird’s- eye view from surround monocular cameras
Anthony Hu, Zak Murez, Nikhil Mohan, Sofía Dudas, Jef- frey Hawke, Vijay Badrinarayanan, Roberto Cipolla, and Alex Kendall. Fiery: Future instance prediction in bird’s- eye view from surround monocular cameras. In ICCV, 2021. 3
work page 2021
-
[18]
St-p3: End-to-end vision-based au- tonomous driving via spatial-temporal feature learning
Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, and Dacheng Tao. St-p3: End-to-end vision-based au- tonomous driving via spatial-temporal feature learning. In ECCV, 2022. 3, 6
work page 2022
-
[19]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wen- hai Wang, et al. Planning-oriented autonomous driving. CVPR2023, 2022. 1, 3
work page 2022
-
[20]
Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, and Hongyang Li. Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving. 2023. 6
work page 2023
-
[21]
Think twice before driv- ing: Towards scalable decoders for end-to-end autonomous driving
Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, and Hongyang Li. Think twice before driv- ing: Towards scalable decoders for end-to-end autonomous driving. In CVPR, 2023. 1, 6
work page 2023
-
[22]
Perceive, interact, predict: Learning dynamic and static clues for end-to-end motion pre- diction
Bo Jiang, Shaoyu Chen, Xinggang Wang, Bencheng Liao, Tianheng Cheng, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, and Chang Huang. Perceive, interact, predict: Learning dynamic and static clues for end-to-end motion pre- diction. arXiv preprint arXiv:2212.02181, 2022. 3, 4
-
[23]
Vad: Vectorized scene representation for efficient autonomous driving
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. ICCV, 2023. 1, 3, 5, 6, 7
work page 2023
-
[24]
Hdmapnet: An online hd map construction and evaluation framework
Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In ICRA, 2022. 3
work page 2022
-
[25]
Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion
Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion. arXiv preprint arXiv:2206.10092, 2022. 3
-
[26]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022. 3, 4
-
[27]
Learning lane graph representa- tions for motion forecasting
Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, and Raquel Urtasun. Learning lane graph representa- tions for motion forecasting. In ECCV, 2020. 3
work page 2020
-
[28]
Lane graph as path: Continuity-preserving path-wise modeling for online lane graph construction
Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Lane graph as path: Continuity-preserving path-wise modeling for online lane graph construction. arXiv preprint arXiv:2303.08815, 2023. 3, 4
-
[29]
Maptr: Structured modeling and learning for online vectorized hd map construction
Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022. 3, 4
-
[30]
Maptrv2: An end-to-end framework for online vectorized hd map construction
Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023. 3, 4, 5
-
[31]
Jiaqi Liu, Peng Hang, Jianqiang Wang, Jian Sun, et al. Mtd-gpt: A multi-task decision-making gpt model for au- tonomous driving at unsignalized intersections. arXiv preprint arXiv:2307.16118, 2023. 3
-
[32]
Vectormapnet: End-to-end vectorized hd map learning,
Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Vec- tormapnet: End-to-end vectorized hd map learning. arXiv preprint arXiv:2206.08920, 2022. 3
-
[33]
Multimodal motion prediction with stacked transformers
Yicheng Liu, Jinghuai Zhang, Liangji Fang, Qinhong Jiang, and Bolei Zhou. Multimodal motion prediction with stacked transformers. In CVPR, 2021. 3
work page 2021
-
[34]
GPT-Driver: Learning to Drive with GPT
Jiageng Mao, Yuxi Qian, Hang Zhao, and Yue Wang. Gpt-driver: Learning to drive with gpt. arXiv preprint arXiv:2310.01415, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. ECCV, 2020. 4
work page 2020
-
[36]
Scene transformer: A unified architecture for predicting mul- tiple agent trajectories
Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zheng- dong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, et al. Scene transformer: A unified architecture for predicting mul- tiple agent trajectories. arXiv preprint arXiv:2106.08417 ,
-
[37]
Covernet: Multimodal behavior prediction using trajectory sets
Tung Phan-Minh, Elena Corina Grigore, Freddy A Boulton, Oscar Beijbom, and Eric M Wolff. Covernet: Multimodal behavior prediction using trajectory sets. In CVPR, 2020. 3
work page 2020
-
[38]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d
Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020. 3
work page 2020
-
[39]
Alvinn: An autonomous land vehicle in a neural network
Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. NeurIPS, 1988. 3
work page 1988
-
[40]
Multi- modal fusion transformer for end-to-end autonomous driv- ing
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi- modal fusion transformer for end-to-end autonomous driv- ing. 2021. 1, 6
work page 2021
-
[41]
Multi- modal fusion transformer for end-to-end autonomous driv- ing
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi- modal fusion transformer for end-to-end autonomous driv- ing. In CVPR, 2021. 3
work page 2021
-
[42]
Improving language understanding by gen- erative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by gen- erative pre-training. 2018. 3
work page 2018
-
[43]
Language models are unsu- pervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsu- pervised multitask learners. OpenAI blog, 1(8):9, 2019. 3
work page 2019
-
[44]
Languagempc: Large language models as decision makers for autonomous driving
Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, and Mingyu Ding. Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023. 3
-
[45]
Safety-enhanced autonomous driving using inter- pretable sensor fusion transformer
Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced autonomous driving using inter- pretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023. 6
work page 2023
-
[46]
End-to-end model-free reinforcement learning for urban driving using implicit affordances
Marin Toromanoff, Emilie Wirbel, and Fabien Moutarde. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In CVPR, 2020. 3
work page 2020
-
[47]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Exploring object-centric temporal modeling for efficient multi-view 3d object detection
Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xi- angyu Zhang. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023. 3
-
[49]
Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, et al. Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving. arXiv preprint arXiv:2312.09245, 2023. 3, 6
-
[50]
Dilu: A knowledge-driven approach to au- tonomous driving with large language models
Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. Dilu: A knowledge-driven approach to au- tonomous driving with large language models. arXiv preprint arXiv:2309.16292, 2023. 3
-
[51]
Drivegpt4: Interpretable end-to-end autonomous driving via large language model
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kenneth KY Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv preprint arXiv:2310.01412 ,
-
[52]
Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, et al. Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective su- pervision. arXiv preprint arXiv:2211.10439, 2022. 3
-
[53]
Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving,
Yunpeng Zhang, Zheng Zhu, Wenzhao Zheng, Junjie Huang, Guan Huang, Jie Zhou, and Jiwen Lu. Beverse: Unified per- ception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743 ,
-
[54]
End-to-end urban driving by imitat- ing a reinforcement learning coach
Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, and Luc Van Gool. End-to-end urban driving by imitat- ing a reinforcement learning coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 1, 6
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.