Devil is in narrow policy: Unleashing exploration in driving VLA models

Canyu Chen, Yuguang Yang, Zhewen Tan, Yizhi Wang, Ruiyi Zhan, Haiyan Liu, Xuanyao Mao, Jason Bao, Xinyue Tang, Linlin Yang, Bingchuan Sun, Yan Wang, Baochang Zhang · 2026 · arXiv 2603.06049

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Action Emergence from Streaming Intent

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.

CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning

cs.RO · 2026-05-14 · conditional · novelty 6.0

CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.

HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

HiDrive is a new closed-loop benchmark for autonomous driving that emphasizes long-tail scenarios, rare objects, and advanced capabilities like legal compliance and moral reasoning.

EponaV2: Driving World Model with Comprehensive Future Reasoning

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.

Driving Intents Amplify Planning-Oriented Reinforcement Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.

CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop performance on Bench2Drive across multiple driving architectures.

citing papers explorer

Showing 6 of 6 citing papers.

Action Emergence from Streaming Intent cs.RO · 2026-05-12 · unverdicted · none · ref 7 · 2 links
A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning cs.RO · 2026-05-14 · conditional · none · ref 3
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving cs.RO · 2026-05-11 · unverdicted · none · ref 3
HiDrive is a new closed-loop benchmark for autonomous driving that emphasizes long-tail scenarios, rare objects, and advanced capabilities like legal compliance and moral reasoning.
EponaV2: Driving World Model with Comprehensive Future Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 7
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
Driving Intents Amplify Planning-Oriented Reinforcement Learning cs.RO · 2026-05-12 · unverdicted · none · ref 5 · 2 links
DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies cs.LG · 2026-05-06 · unverdicted · none · ref 39
CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop performance on Bench2Drive across multiple driving architectures.

Devil is in narrow policy: Unleashing exploration in driving VLA models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer