pith. sign in

hub Canonical reference

Vision-language-action models for autonomous driving: Past, present, and future

Canonical reference. 100% of citing Pith papers cite this work as background.

16 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 7

citation-polarity summary

years

2026 16

roles

background 6

polarities

background 6

clear filters

representative citing papers

Grounding Driving VLA via Inverse Kinematics

cs.CV · 2026-05-20 · conditional · novelty 7.0

By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.

citing papers explorer

Showing 16 of 16 citing papers.