By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.
citing papers explorer
-
Grounding Driving VLA via Inverse Kinematics
By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
-
BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving
The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.