hub Mixed citations

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu · 2025 · cs.CV · arXiv 2506.08052

Mixed citation behavior. Most common role is background (60%).

48 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 48 citing papers arXiv PDF

abstract

Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. All code, model weights, and datasets will be made publicly available to facilitate subsequent research.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 6

citation-polarity summary

background 9 baseline 6

representative citing papers

VLADriveBench: Evaluating CoT-Action Relationship in VLA for Autonomous Driving

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

VLADriveBench combines observational metrics and CoT intervention protocols to evaluate the relevance and causality of reasoning in vision-language-action models for autonomous driving, revealing divergent model behaviors.

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving

cs.RO · 2026-05-26 · unverdicted · novelty 7.0

TPS-Drive uses an agent-centric tokenizer supervised by a frozen 3D detection head to purify VLM spatial representations, enabling better scene forecasting and lower collision rates on nuScenes and NAVSIM benchmarks.

Grounding Driving VLA via Inverse Kinematics

cs.CV · 2026-05-20 · conditional · novelty 7.0

By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.

SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving

cs.RO · 2026-04-13 · unverdicted · novelty 7.0 · 2 refs

SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.

The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

cs.RO · 2026-03-14 · unverdicted · novelty 7.0

PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.

Teaching Vision-Language-Action Models What to See and Where to Look

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

DriveTeach-VLA adds Driving-aware Vision Distillation pretraining and 2D Trajectory-Guided Prompts to VLA models, then reports state-of-the-art results on NAVSIM and nuScenes.

DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.

X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

X-Mind proposes an efficient internal visual chain-of-thought using compressed BEV sketches and recurrent block diffusion to embed predictive world models into end-to-end driving policies.

VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

VLGA introduces geometry as a fourth modality in VLA models via pointmap regression loss, reporting SOTA open-loop and closed-loop driving metrics on nuScenes and Bench2Drive.

D$^3$-MoE:Dual Disentangled Diffusion Mixture-of-Experts for Style-Controllable End-to-End Autonomous Driving

cs.RO · 2026-06-03 · unverdicted · novelty 6.0

D³-MoE disentangles style and physical axes with diffusion and self-supervised MoE experts to produce style-controllable trajectories, reporting SOTA 88.2 PDMS on NAVSIM.

IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

IDOL uses inverse dynamics on adjacent predicted latent futures to extract planning-relevant motion deltas, then optimizes trajectories with a closed-loop refinement step, reporting SOTA results on NAVSIM v1 and v2.

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

A structured perturbation framework applied to VLA driving models reveals evaluation-dependent visual grounding patterns and uneven dependency across abstraction levels.

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.

CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning

cs.RO · 2026-05-14 · conditional · novelty 6.0

CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

VECTOR-DRIVE uses shared self-attention with semantic-aware expert routing of tokens to VL and trajectory experts plus flow-matching action decoding to reach 88.91 driving score on Bench2Drive.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment

cs.RO · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

GSDrive combines IL priors with RL feedback by probing multi-mode futures inside a 3D Gaussian Splatting simulator to supply dense rewards for closed-loop driving policy improvement on nuScenes.

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

Creates LTD dataset for open-ended traffic VQA and trains UniVLT model to achieve SOTA on unified microscopic AD and macroscopic traffic reasoning tasks.

citing papers explorer

Showing 17 of 17 citing papers after filters.

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving cs.RO · 2026-05-26 · unverdicted · none · ref 82 · internal anchor
TPS-Drive uses an agent-centric tokenizer supervised by a frozen 3D detection head to purify VLM spatial representations, enabling better scene forecasting and lower collision rates on nuScenes and NAVSIM benchmarks.
SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving cs.RO · 2026-04-13 · unverdicted · none · ref 14 · 2 links · internal anchor
SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.
Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving cs.RO · 2026-03-14 · unverdicted · none · ref 35 · internal anchor
PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.
D$^3$-MoE:Dual Disentangled Diffusion Mixture-of-Experts for Style-Controllable End-to-End Autonomous Driving cs.RO · 2026-06-03 · unverdicted · none · ref 38 · internal anchor
D³-MoE disentangles style and physical axes with diffusion and self-supervised MoE experts to produce style-controllable trajectories, reporting SOTA 88.2 PDMS on NAVSIM.
IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving cs.RO · 2026-05-29 · unverdicted · none · ref 39 · internal anchor
IDOL uses inverse dynamics on adjacent predicted latent futures to extract planning-relevant motion deltas, then optimizes trajectories with a closed-loop refinement step, reporting SOTA results on NAVSIM v1 and v2.
AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond cs.RO · 2026-05-25 · unverdicted · none · ref 28 · internal anchor
AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning cs.RO · 2026-05-14 · conditional · none · ref 23 · internal anchor
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving cs.RO · 2026-05-12 · unverdicted · none · ref 27 · 2 links · internal anchor
MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving cs.RO · 2026-05-06 · unverdicted · none · ref 112 · 2 links · internal anchor
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment cs.RO · 2026-04-30 · unverdicted · none · ref 12 · 2 links · internal anchor
GSDrive combines IL priors with RL feedback by probing multi-mode futures inside a 3D Gaussian Splatting simulator to supply dense rewards for closed-loop driving policy improvement on nuScenes.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 28 · internal anchor
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System cs.RO · 2026-06-16 · unverdicted · none · ref 14 · 2 links · internal anchor
Qwen-RobotNav provides a parameterized navigation model trained on 15.6M samples with vision-language co-training that achieves SOTA results on benchmarks and zero-shot transfer to real robots.
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning cs.RO · 2026-06-04 · unverdicted · none · ref 44 · internal anchor
Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving cs.RO · 2026-05-19 · unverdicted · none · ref 34 · internal anchor
SafeAlign-VLA uses counterfactual safety pairing and anchor-based group relative policy optimization to incorporate negative data for safer VLA-based autonomous driving.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling cs.RO · 2026-05-13 · unverdicted · none · ref 25 · 2 links · internal anchor
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.
LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving cs.RO · 2026-06-07 · unverdicted · none · ref 25 · internal anchor
LUNA-AD introduces a tri-system model with multi-agent hypothesis exploration, distilled lightweight inference, and reflection-driven lifelong learning that claims state-of-the-art success rates on nuPlan benchmarks with reduced latency.
CLEAR: Cognition and Latent Evaluation for Adaptive Routing in End-to-End Autonomous Driving cs.RO · 2026-06-04 · unverdicted · none · ref 17 · internal anchor
CLEAR achieves state-of-the-art PDMS of 93.7 on NAVSIM v1 by combining single-step VAE latent drift with Qwen 3.5-guided adaptive scheduling and trajectory scoring for end-to-end driving.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer