ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

· 2026 · cs.CV · arXiv 2604.02714

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

End-to-end autonomous driving models based on Vision-Language-Action (VLA) architectures have shown promising results by learning driving policies through behavior cloning on expert demonstrations. However, imitation learning inherently limits the model to replicating observed behaviors without exploring diverse driving strategies, leaving it brittle in novel or out-of-distribution scenarios. Reinforcement learning (RL) offers a natural remedy by enabling policy exploration beyond the expert distribution. Yet VLA models, typically trained on offline datasets, lack directly observable state transitions, necessitating a learned world model to anticipate action consequences. In this work, we propose a unified understanding-and-generation framework that leverages world modeling to simultaneously enable meaningful exploration and provide dense supervision. Specifically, we augment trajectory prediction with future RGB and depth image generation as dense world modeling objectives, requiring the model to learn fine-grained visual and geometric representations that substantially enrich the planning backbone. Beyond serving as a supervisory signal, the world model further acts as a source of intrinsic reward for policy exploration: its image prediction uncertainty naturally measures a trajectory's novelty relative to the training distribution, where high uncertainty indicates out-of-distribution scenarios that, if safe, represent valuable learning opportunities. We incorporate this exploration signal into a safety-gated reward and optimize the policy via Group Relative Policy Optimization (GRPO). Experiments on the NAVSIM and nuScenes benchmarks demonstrate the effectiveness of our approach, achieving a state-of-the-art PDMS score of 93.7 and an EPDMS of 88.8 on NAVSIM. The code and demo will be publicly available at https://zihaosheng.github.io/ExploreVLA/.

representative citing papers

Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

cs.CV · 2026-05-20 · unverdicted · novelty 5.0 · 2 refs

CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.

citing papers explorer

Showing 1 of 1 citing paper.

Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving cs.CV · 2026-05-20 · unverdicted · none · ref 46 · 2 links · internal anchor
CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.

ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

fields

years

verdicts

representative citing papers

citing papers explorer