AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.
arXiv preprint arXiv:2503.07234 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
SteinsGateDrive decouples LLM inference latency from vehicle control by pre-selecting alpha, beta, and gamma worldline futures that a runtime validates against safety contracts until abort conditions trigger.
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.
SAIL reduces prediction error by up to 28.8% on the hardest 1% of long-tail trajectory samples in AV datasets through attribute-guided augmentation and adaptive contrastive learning with cosine momentum, hard-negative mining, and dynamic pseudo-labeling.
DeepSight uses parallel latent feature prediction in BEV for long-horizon world modeling and adaptive text reasoning to reach state-of-the-art closed-loop performance on the Bench2drive benchmark.
citing papers explorer
-
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.
-
Steins;Gate Drive: Semantic Safety Arbitration over Structured Futures for Latency-Decoupled LLM Planning
SteinsGateDrive decouples LLM inference latency from vehicle control by pre-selecting alpha, beta, and gamma worldline futures that a runtime validates against safety contracts until abort conditions trigger.
-
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.
-
SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles
SAIL reduces prediction error by up to 28.8% on the hardest 1% of long-tail trajectory samples in AV datasets through attribute-guided augmentation and adaptive contrastive learning with cosine momentum, hard-negative mining, and dynamic pseudo-labeling.
-
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
DeepSight uses parallel latent feature prediction in BEV for long-horizon world modeling and adaptive text reasoning to reach state-of-the-art closed-loop performance on the Bench2drive benchmark.