pith. sign in

arxiv: 2605.24531 · v1 · pith:W4WSDVJ4new · submitted 2026-05-23 · 💻 cs.CV

NudgeVAD: Language-Nudged End-to-End Driving via FiLM Residuals

classification 💻 cs.CV
keywords languagenudgevadcommandsbecomesdrivingend-to-endfilmplanner
0
0 comments X
read the original abstract

Natural-language instructions promise controllable end-to-end driving, but their benefit can be hidden when planners already receive reliable high-level commands. We propose NudgeVAD, a frozen-planner residual framework that uses language as a calibrated nudge to a VAD trajectory. With identity-initialized FiLM and a zero-initialized residual head, NudgeVAD is equivalent to the frozen planner at initialization, so learned deviations arise only from language-conditioned residuals. We evaluate NudgeVAD along a command-reliability axis. With reliable commands, language improves the initial planner but becomes nearly redundant once compared against VAD-FT (UNCOND), a compute-matched VAD model fine-tuned without language. With random commands, however, language becomes essential: detaching text degrades ADE6s to 3.166 m, while NudgeVAD with text recovers 2.806 m and outperforms VAD-FT (UNCOND) by 0.312 m. These results show that language is not universally additive; it is most valuable when the categorical command channel is unreliable.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.