hub Mixed citations

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu · 2025 · cs.CV · arXiv 2506.08052

Mixed citation behavior. Most common role is background (60%).

55 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. All code, model weights, and datasets will be made publicly available to facilitate subsequent research.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 6

citation-polarity summary

background 9 baseline 6

representative citing papers

FleetAgent: Teleoperation Assistant for Autonomous Fleets via Vectorized V2N Messages

cs.RO · 2026-06-19 · unverdicted · novelty 7.0

FleetAgent pairs a vector-to-embedding interface (VecFormer) with an MLLM to turn compact V2N messages into structured natural-language teleoperation assistance, cutting uplink payload 625x and improving Lingo-Judge score 16.8% on a new nuScenes-derived dataset.

VLADriveBench: Evaluating CoT-Action Relationship in VLA for Autonomous Driving

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

VLADriveBench combines observational metrics and CoT intervention protocols to evaluate the relevance and causality of reasoning in vision-language-action models for autonomous driving, revealing divergent model behaviors.

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving

cs.RO · 2026-05-26 · unverdicted · novelty 7.0

TPS-Drive uses an agent-centric tokenizer supervised by a frozen 3D detection head to purify VLM spatial representations, enabling better scene forecasting and lower collision rates on nuScenes and NAVSIM benchmarks.

Grounding Driving VLA via Inverse Kinematics

cs.CV · 2026-05-20 · conditional · novelty 7.0

By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.

SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving

cs.RO · 2026-04-13 · unverdicted · novelty 7.0 · 2 refs

SCORP delivers 10-28% gains in safety and 2-7% in efficiency metrics on WOMD by using dual-path scene conditioning in diffusion planning plus variance-gated group-relative policy optimization for closed-loop stability.

The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

cs.RO · 2026-03-14 · unverdicted · novelty 7.0

PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.

Teaching Vision-Language-Action Models What to See and Where to Look

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

DriveTeach-VLA adds Driving-aware Vision Distillation pretraining and 2D Trajectory-Guided Prompts to VLA models, then reports state-of-the-art results on NAVSIM and nuScenes.

DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.

X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

X-Mind proposes an efficient internal visual chain-of-thought using compressed BEV sketches and recurrent block diffusion to embed predictive world models into end-to-end driving policies.

UniTeD: Unified Temporal Diffusion for Joint Perception and Planning in Autonomous Driving

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

UniTeD unifies perception and planning in autonomous driving via shared temporal diffusion with TTM and ARS modules, reporting SOTA results on benchmarks.

World Engine: Towards the Era of Post-Training for Autonomous Driving

cs.RO · 2026-06-18 · unverdicted · novelty 6.0

World Engine generates realistic safety-critical driving variations from logs for reinforcement post-training, reducing benchmark failures more than data scaling and showing collision reductions plus on-road gains in a production system.

VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

VLGA introduces geometry as a fourth modality in VLA models via pointmap regression loss, reporting SOTA open-loop and closed-loop driving metrics on nuScenes and Bench2Drive.

D$^3$-MoE:Dual Disentangled Diffusion Mixture-of-Experts for Style-Controllable End-to-End Autonomous Driving

cs.RO · 2026-06-03 · unverdicted · novelty 6.0

D³-MoE disentangles style and physical axes with diffusion and self-supervised MoE experts to produce style-controllable trajectories, reporting SOTA 88.2 PDMS on NAVSIM.

IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

IDOL uses inverse dynamics on adjacent predicted latent futures to extract planning-relevant motion deltas, then optimizes trajectories with a closed-loop refinement step, reporting SOTA results on NAVSIM v1 and v2.

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

A structured perturbation framework applied to VLA driving models reveals evaluation-dependent visual grounding patterns and uneven dependency across abstraction levels.

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

AnyScene is an occupancy-centric framework using a Spatial-Temporal Occupancy Diffusion Transformer and Geometry-Grounded View Expansion to generate controllable driving scenes and videos from BEV layouts.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.

CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning

cs.RO · 2026-05-14 · conditional · novelty 6.0

CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

VECTOR-DRIVE uses shared self-attention with semantic-aware expert routing of tokens to VL and trajectory experts plus flow-matching action decoding to reach 88.91 driving score on Bench2Drive.

citing papers explorer

Showing 28 of 28 citing papers after filters.

VLADriveBench: Evaluating CoT-Action Relationship in VLA for Autonomous Driving cs.CV · 2026-06-10 · unverdicted · none · ref 18 · internal anchor
VLADriveBench combines observational metrics and CoT intervention protocols to evaluate the relevance and causality of reasoning in vision-language-action models for autonomous driving, revealing divergent model behaviors.
Grounding Driving VLA via Inverse Kinematics cs.CV · 2026-05-20 · conditional · none · ref 25 · internal anchor
By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models cs.CV · 2026-04-06 · unverdicted · none · ref 24 · internal anchor
Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.
Teaching Vision-Language-Action Models What to See and Where to Look cs.CV · 2026-07-02 · unverdicted · none · ref 29 · internal anchor
DriveTeach-VLA adds Driving-aware Vision Distillation pretraining and 2D Trajectory-Guided Prompts to VLA models, then reports state-of-the-art results on NAVSIM and nuScenes.
DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving cs.CV · 2026-07-01 · unverdicted · none · ref 20 · internal anchor
DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.
X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving cs.CV · 2026-06-27 · unverdicted · none · ref 8 · internal anchor
X-Mind proposes an efficient internal visual chain-of-thought using compressed BEV sketches and recurrent block diffusion to embed predictive world models into end-to-end driving policies.
UniTeD: Unified Temporal Diffusion for Joint Perception and Planning in Autonomous Driving cs.CV · 2026-06-24 · unverdicted · none · ref 33 · internal anchor
UniTeD unifies perception and planning in autonomous driving via shared temporal diffusion with TTM and ARS modules, reporting SOTA results on benchmarks.
VLGA: Vision-Language-Geometry-Action Models for Autonomous Driving cs.CV · 2026-06-10 · unverdicted · none · ref 22 · internal anchor
VLGA introduces geometry as a fourth modality in VLA models via pointmap regression loss, reporting SOTA open-loop and closed-loop driving metrics on nuScenes and Bench2Drive.
Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior? cs.CV · 2026-05-29 · unverdicted · none · ref 12 · internal anchor
A structured perturbation framework applied to VLA driving models reveals evaluation-dependent visual grounding patterns and uneven dependency across abstraction levels.
CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving cs.CV · 2026-05-17 · unverdicted · none · ref 23 · internal anchor
CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 50 · 2 links · internal anchor
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
DriveFuture: Future-Aware Latent World Models for Autonomous Driving cs.CV · 2026-05-10 · unverdicted · none · ref 17 · internal anchor
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving cs.CV · 2026-05-09 · unverdicted · none · ref 29 · 2 links · internal anchor
VECTOR-DRIVE uses shared self-attention with semantic-aware expert routing of tokens to VL and trajectory experts plus flow-matching action decoding to reach 88.91 driving score on Bench2Drive.
Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset cs.CV · 2026-04-24 · unverdicted · none · ref 16 · internal anchor
Creates LTD dataset for open-ended traffic VQA and trains UniVLT model to achieve SOTA on unified microscopic AD and macroscopic traffic reasoning tasks.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation cs.CV · 2026-04-20 · unverdicted · none · ref 62 · 2 links · internal anchor
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 28 · internal anchor
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale cs.CV · 2026-04-01 · unverdicted · none · ref 37 · internal anchor
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World cs.CV · 2025-12-29 · unverdicted · none · ref 44 · internal anchor
DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
SimScale: Learning to Drive via Real-World Simulation at Scale cs.CV · 2025-11-28 · conditional · none · ref 51 · internal anchor
SimScale synthesizes unseen driving states from real logs via neural rendering and reactive environments, generates pseudo-expert trajectories, and shows that co-training on real plus simulated data improves planning robustness and generalization on real benchmarks, with gains scaling by simulation
DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving cs.CV · 2026-06-07 · unverdicted · none · ref 13 · internal anchor
Creates DriveReward dataset with counterfactual annotations and a 1B VLM reward model that outperforms larger VLMs on driving tasks and matches rule-based rewards in RL and trajectory scoring.
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model cs.CV · 2026-05-21 · unverdicted · none · ref 32 · internal anchor
LVDrive improves closed-loop driving on Bench2Drive by adding latent future scene prediction to VLA models via unified embedding space processing and two-stage trajectory decoding.
EponaV2: Driving World Model with Comprehensive Future Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 38 · internal anchor
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model cs.CV · 2026-04-21 · unverdicted · none · ref 41 · internal anchor
SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework cs.CV · 2026-04-16 · unverdicted · none · ref 29 · internal anchor
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving cs.CV · 2026-03-20 · unverdicted · none · ref 26 · internal anchor
DynFlowDrive models action-conditioned scene transitions via rectified flow in latent space and adds stability-aware trajectory selection, showing gains on nuScenes and NavSim without added inference cost.
DriveStack-VLA: Render-Teacher Alignment for BEV-Based DeepStack Vision-Language-Action Model cs.CV · 2026-06-23 · unverdicted · none · ref 7 · internal anchor
DriveStack-VLA injects BEV into VLM decoder, aligns real and rasterized image focus, and adds head-based trajectory self-critique, reporting 91.6 PDMS on NAVSIMv1 and 79.49 driving score on Bench2Drive.
Intend, Reflect, Refine: An Adaptive Multimodal Reflection Framework for Autonomous Driving cs.CV · 2026-06-22 · unverdicted · none · ref 17 · internal anchor
IRR-Drive adds an adaptive multimodal reflection step (text intention plus predicted future BEV) that lets a VLA model self-correct its trajectory plan according to scene complexity and reports SOTA on NAVSIM.
ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving cs.CV · 2026-04-03 · unreviewed · ref 29 · internal anchor

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer