EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation

· 2026 · cs.CV · arXiv 2603.09465

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and future-informed trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, future-informed trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to synthesize reasoning trajectories that model future evolutions, enabling the student model to internalize the future-aware insights of the teacher. EvoDriveVLA achieves SOTA performance in nuScenes open-loop evaluation and significantly enhances performance in NAVSIM closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.

representative citing papers

GraspFoM: Towards Reconstruction-Driven Robotic Grasping with 3D Foundation Priors

cs.RO · 2026-06-07 · unverdicted · novelty 5.0

GraspFoM creates a shared 3D latent from SAM3D priors, adds an anchor-initialized diffuser for multimodal grasps, and uses reconstruction-aware scoring plus residual updates to jointly achieve SOTA reconstruction and grasping with few extra parameters.

SparseStreet: Sparse Gaussian Splatting for Real-Time Street Scene Simulation

cs.CV · 2026-06-02 · unverdicted · novelty 5.0

SparseStreet applies node-based learnable pruning followed by static background compression to 3D Gaussian Splatting, reporting up to 80% reduction in primitives with minimal quality loss on Waymo and nuScenes street scene data.

citing papers explorer

Showing 1 of 1 citing paper after filters.

GraspFoM: Towards Reconstruction-Driven Robotic Grasping with 3D Foundation Priors cs.RO · 2026-06-07 · unverdicted · none · ref 5 · internal anchor
GraspFoM creates a shared 3D latent from SAM3D priors, adds an anchor-initialized diffuser for multimodal grasps, and uses reconstruction-aware scoring plus residual updates to jointly achieve SOTA reconstruction and grasping with few extra parameters.

EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation

fields

years

verdicts

representative citing papers

citing papers explorer