hub Mixed citations

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu · 2025 · cs.CV · arXiv 2506.08052

Mixed citation behavior. Most common role is background (60%).

36 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 36 citing papers arXiv PDF

abstract

Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. All code, model weights, and datasets will be made publicly available to facilitate subsequent research.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 6

citation-polarity summary

background 9 baseline 6

representative citing papers

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving

cs.RO · 2026-05-26 · unverdicted · novelty 7.0

TPS-Drive uses an agent-centric tokenizer supervised by a frozen 3D detection head to purify VLM spatial representations, enabling better scene forecasting and lower collision rates on nuScenes and NAVSIM benchmarks.

Grounding Driving VLA via Inverse Kinematics

cs.CV · 2026-05-20 · conditional · novelty 7.0

By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.

SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving

cs.RO · 2026-04-13 · unverdicted · novelty 7.0 · 2 refs

SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.

The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

cs.RO · 2026-03-14 · unverdicted · novelty 7.0

PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

A structured perturbation framework applied to VLA driving models reveals evaluation-dependent visual grounding patterns and uneven dependency across abstraction levels.

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.

CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning

cs.RO · 2026-05-14 · conditional · novelty 6.0

CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

VECTOR-DRIVE uses shared self-attention with semantic-aware expert routing of tokens to VL and trajectory experts plus flow-matching action decoding to reach 88.91 driving score on Bench2Drive.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment

cs.RO · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

GSDrive combines IL priors with RL feedback by probing multi-mode futures inside a 3D Gaussian Splatting simulator to supply dense rewards for closed-loop driving policy improvement on nuScenes.

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

Creates LTD dataset for open-ended traffic VQA and trains UniVLT model to achieve SOTA on unified microscopic AD and macroscopic traffic reasoning tasks.

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

TRFP combines rectified flow models with truncation to support multimodal policies in MaxEnt RL while allowing fast one-step sampling and stable training.

DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

cs.RO · 2026-02-26 · unverdicted · novelty 6.0

The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

cs.CV · 2025-12-29 · unverdicted · novelty 6.0

DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.

Pseudo-Expert Regularized Offline RL for End-to-End Autonomous Driving in Photorealistic Closed-Loop Environments

cs.RO · 2025-12-21 · conditional · novelty 6.0

Pseudo-expert regularized offline RL reduces collisions and improves route completion for camera-based driving models trained on fixed simulator datasets from nuScenes.

citing papers explorer

Showing 14 of 14 citing papers after filters.

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving cs.RO · 2026-05-26 · unverdicted · none · ref 82 · internal anchor
TPS-Drive uses an agent-centric tokenizer supervised by a frozen 3D detection head to purify VLM spatial representations, enabling better scene forecasting and lower collision rates on nuScenes and NAVSIM benchmarks.
SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving cs.RO · 2026-04-13 · unverdicted · none · ref 14 · 2 links · internal anchor
SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.
Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving cs.RO · 2026-03-14 · unverdicted · none · ref 35 · internal anchor
PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.
AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond cs.RO · 2026-05-25 · unverdicted · none · ref 28 · internal anchor
AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning cs.RO · 2026-05-14 · conditional · none · ref 23 · internal anchor
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving cs.RO · 2026-05-12 · unverdicted · none · ref 27 · 2 links · internal anchor
MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving cs.RO · 2026-05-06 · unverdicted · none · ref 112 · 2 links · internal anchor
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment cs.RO · 2026-04-30 · unverdicted · none · ref 12 · 2 links · internal anchor
GSDrive combines IL priors with RL feedback by probing multi-mode futures inside a 3D Gaussian Splatting simulator to supply dense rewards for closed-loop driving policy improvement on nuScenes.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 28 · internal anchor
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
Pseudo-Expert Regularized Offline RL for End-to-End Autonomous Driving in Photorealistic Closed-Loop Environments cs.RO · 2025-12-21 · conditional · none · ref 34 · internal anchor
Pseudo-expert regularized offline RL reduces collisions and improves route completion for camera-based driving models trained on fixed simulator datasets from nuScenes.
MiMo-Embodied: X-Embodied Foundation Model Technical Report cs.RO · 2025-11-20 · unverdicted · none · ref 32 · internal anchor
MiMo-Embodied is a single foundation model that achieves state-of-the-art results on 17 embodied AI benchmarks and 12 autonomous driving benchmarks through multi-stage learning, curated data, and CoT/RL fine-tuning that produces positive cross-domain transfer.
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail cs.RO · 2025-10-30 · conditional · none · ref 46 · internal anchor
Alpamayo-R1 introduces a VLA model with a Chain of Causation dataset and multi-stage SFT-plus-RL training that reports 12% better planning accuracy and 35% fewer close encounters versus trajectory-only baselines in driving tasks.
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving cs.RO · 2026-05-19 · unverdicted · none · ref 34 · internal anchor
SafeAlign-VLA uses counterfactual safety pairing and anchor-based group relative policy optimization to incorporate negative data for safer VLA-based autonomous driving.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling cs.RO · 2026-05-13 · unverdicted · none · ref 25 · 2 links · internal anchor
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer