AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.
arXiv preprint arXiv:2407.00959 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4polarities
background 2representative citing papers
SAIL reduces prediction error by up to 28.8% on the hardest 1% of long-tail trajectory samples in AV datasets through attribute-guided augmentation and adaptive contrastive learning with cosine momentum, hard-negative mining, and dynamic pseudo-labeling.
DeepSight uses parallel latent feature prediction in BEV for long-horizon world modeling and adaptive text reasoning to reach state-of-the-art closed-loop performance on the Bench2drive benchmark.
A survey synthesizing sensor fusion strategies, AV datasets, and emerging LLM/VLM-powered object detection pipelines for autonomous vehicles.
citing papers explorer
-
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.
-
SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles
SAIL reduces prediction error by up to 28.8% on the hardest 1% of long-tail trajectory samples in AV datasets through attribute-guided augmentation and adaptive contrastive learning with cosine momentum, hard-negative mining, and dynamic pseudo-labeling.
-
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
DeepSight uses parallel latent feature prediction in BEV for long-horizon world modeling and adaptive text reasoning to reach state-of-the-art closed-loop performance on the Bench2drive benchmark.
-
All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles
A survey synthesizing sensor fusion strategies, AV datasets, and emerging LLM/VLM-powered object detection pipelines for autonomous vehicles.