SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

· 2026 · cs.CV · arXiv 2604.19710

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit limited robustness. In this paper, we propose SpanVLA, a novel end-to-end autonomous driving framework, integrating an autoregressive reasoning and a flow-matching action expert. First, SpanVLA introduces an efficient bridge to leverage the vision and reasoning guidance of VLM to efficiently plan future trajectories using a flow-matching policy conditioned on historical trajectory initialization, which significantly reduces inference time. Second, to further improve the performance and robustness of the SpanVLA model, we propose a GRPO-based post-training method to enable the VLA model not only to learn from positive driving samples but also to learn how to avoid the typical negative behaviors and learn recovery behaviors. We further introduce mReasoning, a new real-world driving reasoning dataset, focusing on complex, reasoning-demanding scenarios and negative-recovery samples. Extensive experiments on the NAVSIM (v1 and v2) demonstrate the competitive performance of the SpanVLA model. Additionally, the qualitative results across diverse scenarios highlight the planning performance and robustness of our model.

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.

ChainFlow-VLA: Causal Flow Planning with Vision-Language Models

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

ChainFlow-VLA unifies autoregressive causal trajectory modes with VLM-conditioned diffusion refinement to reach 94.85 on NAVSIM v1, matching human performance.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

SafeAlign-VLA uses counterfactual safety pairing and anchor-based group relative policy optimization to incorporate negative data for safer VLA-based autonomous driving.

citing papers explorer

Showing 4 of 4 citing papers.

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems cs.RO · 2026-05-11 · unverdicted · none · ref 42 · internal anchor
MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.
ChainFlow-VLA: Causal Flow Planning with Vision-Language Models cs.CV · 2026-05-22 · unverdicted · none · ref 24 · internal anchor
ChainFlow-VLA unifies autoregressive causal trajectory modes with VLM-conditioned diffusion refinement to reach 94.85 on NAVSIM v1, matching human performance.
DriveFuture: Future-Aware Latent World Models for Autonomous Driving cs.CV · 2026-05-10 · unverdicted · none · ref 59 · internal anchor
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving cs.RO · 2026-05-19 · unverdicted · none · ref 43 · internal anchor
SafeAlign-VLA uses counterfactual safety pairing and anchor-based group relative policy optimization to incorporate negative data for safer VLA-based autonomous driving.

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer