pith. sign in

Devil is in narrow policy: Unleashing exploration in driving VLA models

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

years

2026 6

roles

background 3

polarities

background 3

representative citing papers

Action Emergence from Streaming Intent

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.

Driving Intents Amplify Planning-Oriented Reinforcement Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.

citing papers explorer

Showing 6 of 6 citing papers.

  • Action Emergence from Streaming Intent cs.RO · 2026-05-12 · unverdicted · none · ref 7 · 2 links

    A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.

  • CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning cs.RO · 2026-05-14 · conditional · none · ref 3

    CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.

  • HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving cs.RO · 2026-05-11 · unverdicted · none · ref 3

    HiDrive is a new closed-loop benchmark for autonomous driving that emphasizes long-tail scenarios, rare objects, and advanced capabilities like legal compliance and moral reasoning.

  • EponaV2: Driving World Model with Comprehensive Future Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 7

    EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.

  • Driving Intents Amplify Planning-Oriented Reinforcement Learning cs.RO · 2026-05-12 · unverdicted · none · ref 5 · 2 links

    DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.

  • CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies cs.LG · 2026-05-06 · unverdicted · none · ref 39

    CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop performance on Bench2Drive across multiple driving architectures.