NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Ishaan Rawal; Shubh Gupta; Wei Zhan; Yihan Hu

arxiv: 2602.21172 · v3 · pith:LSTYFGFUnew · submitted 2026-02-24 · 💻 cs.AI · cs.CV

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Ishaan Rawal , Shubh Gupta , Yihan Hu , Wei Zhan This is my paper

classification 💻 cs.AI cs.CV

keywords nordreasoninggrpoachievesannotationsautonomousbiascompetitive

0 comments

read the original abstract

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves competitive performance while being fine-tuned on <60% of the data and no reasoning annotations, resulting in 3x fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NORD overcomes this by incorporating Dr. GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NORD achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems. Website: https://nord-vla-ai.github.io/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Driving Intents Amplify Planning-Oriented Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 6.0

DIAL uses intent-conditioned CFG and multi-intent GRPO to expand and preserve diverse modes in continuous-action preference RL, lifting RFS to 9.14 and surpassing both prior best (8.5) and human demonstration (8.13).
Driving Intents Amplify Planning-Oriented Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 5.0

DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
cs.CV 2026-04 unverdicted novelty 5.0

SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.