LWDrive uses future-frame supervision on VLMs to create world-model features that a multi-layer Foresight Cascade Planner refines into final trajectories, reporting 92.0 on NAVSIM and 89.6 on NAVSIM-v2.
Driveworld-vla: Unified latent-space world modeling with vision-language-action for au- tonomous driving.ArXiv, abs/2602.06521
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 3polarities
background 3representative citing papers
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
SteinsGateDrive decouples LLM inference latency from vehicle control by pre-selecting alpha, beta, and gamma worldline futures that a runtime validates against safety contracts until abort conditions trigger.
LVDrive improves closed-loop driving on Bench2Drive by adding latent future scene prediction to VLA models via unified embedding space processing and two-stage trajectory decoding.
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.
citing papers explorer
-
LWDrive: Layer-Wise World-Model-Guided Vision-Language Model Planning for Autonomous Driving
LWDrive uses future-frame supervision on VLMs to create world-model features that a multi-layer Foresight Cascade Planner refines into final trajectories, reporting 92.0 on NAVSIM and 89.6 on NAVSIM-v2.
-
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
-
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model
LVDrive improves closed-loop driving on Bench2Drive by adding latent future scene prediction to VLA models via unified embedding space processing and two-stage trajectory decoding.
-
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.