Impromptu vla: Open weights and open data for driving vision-language-action models

Chi, H · 2025 · arXiv 2505.23757

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Learning Vision-Language-Action World Models for Autonomous Driving

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

VLA-World improves autonomous driving by using action-guided future image generation followed by reflective reasoning over the imagined scene to refine trajectories.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.

AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

cs.RO · 2026-04-27 · unverdicted · novelty 6.0

AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving

cs.CV · 2026-03-20 · unverdicted · novelty 5.0

DynFlowDrive models action-conditioned scene transitions via rectified flow in latent space and adds stability-aware trajectory selection, showing gains on nuScenes and NavSim without added inference cost.

EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation

cs.CV · 2026-03-10 · unverdicted · novelty 5.0

EvoDriveVLA uses collaborative perception-planning distillation with self-anchor and future-aware teachers to fix perception degradation and long-term instability in driving VLA models, reaching SOTA on nuScenes and NAVSIM.

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

citing papers explorer

Showing 8 of 8 citing papers.

Learning Vision-Language-Action World Models for Autonomous Driving cs.CV · 2026-04-10 · unverdicted · none · ref 16
VLA-World improves autonomous driving by using action-guided future image generation followed by reflective reasoning over the imagined scene to refine trajectories.
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving cs.RO · 2026-05-12 · unverdicted · none · ref 18 · 2 links
MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation cs.RO · 2026-04-27 · unverdicted · none · ref 21
AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving cs.CV · 2026-04-22 · unverdicted · none · ref 6
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 11
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving cs.CV · 2026-03-20 · unverdicted · none · ref 6
DynFlowDrive models action-conditioned scene transitions via rectified flow in latent space and adds stability-aware trajectory selection, showing gains on nuScenes and NavSim without added inference cost.
EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation cs.CV · 2026-03-10 · unverdicted · none · ref 3
EvoDriveVLA uses collaborative perception-planning distillation with self-anchor and future-aware teachers to fix perception degradation and long-term instability in driving VLA models, reaching SOTA on nuScenes and NAVSIM.
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments cs.CV · 2026-04-20 · unverdicted · none · ref 17
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

Impromptu vla: Open weights and open data for driving vision-language-action models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer