By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
Insightdrive: Insight scene representation for end-to-end autonomous driving,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
OWMDrive combines multi-step 3D occupancy forecasting with diffusion planning to produce more foresighted trajectories in autonomous driving.
citing papers explorer
-
Grounding Driving VLA via Inverse Kinematics
By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
-
OWMDrive: Causality-Aware End-to-End Autonomous Driving via 4D Occupancy World Model
OWMDrive combines multi-step 3D occupancy forecasting with diffusion planning to produce more foresighted trajectories in autonomous driving.