{"paper":{"title":"ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"ReCogDrive combines a vision-language model for cognition with a reinforced diffusion planner to generate feasible, safe driving trajectories.","cross_cats":["cs.RO"],"primary_cat":"cs.CV","authors_text":"Bing Wang, Fang Li, Gangwei Xu, Guang Chen, Haiyang Sun, Hangjun Ye, Kaixin Xiong, Kun Ma, Lijun Zhou, Long Chen, Sixu Yan, Wenyu Liu, Xiangyu Guo, Xinggang Wang, Yongkang Li","submitted_at":"2025-06-09T03:14:04Z","abstract_excerpt":"Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and pla"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ReCogDrive achieves state-of-the-art performance on the NAVSIM and Bench2Drive benchmarks while demonstrating strong scene comprehension across diverse driving scenarios.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The hierarchical data pipeline (generation, refinement, quality control) successfully instills transferable human driving cognition into the VLM without introducing dataset-specific biases that limit generalization to real-world conditions.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ReCogDrive combines a vision-language model for cognition with a reinforced diffusion planner to generate feasible, safe driving trajectories.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"55f72864b9aded7b041f4abd791c40db0758cf1b3f7110230602fb4836f27ad4"},"source":{"id":"2506.08052","kind":"arxiv","version":2},"verdict":{"id":"dfee4c2c-5de7-4b1e-bcb9-aaa919fcdc8f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T07:32:02.372355Z","strongest_claim":"ReCogDrive achieves state-of-the-art performance on the NAVSIM and Bench2Drive benchmarks while demonstrating strong scene comprehension across diverse driving scenarios.","one_line_summary":"ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The hierarchical data pipeline (generation, refinement, quality control) successfully instills transferable human driving cognition into the VLM without introducing dataset-specific biases that limit generalization to real-world conditions.","pith_extraction_headline":"ReCogDrive combines a vision-language model for cognition with a reinforced diffusion planner to generate feasible, safe driving trajectories."},"references":{"count":45,"sample":[{"doi":"","year":null,"title":"Phi-4 Technical Report","work_id":"b6274271-7af9-4ee8-993b-ba1ba4205ba8","ref_index":1,"cited_arxiv_id":"2412.08905","is_internal_anchor":true},{"doi":"","year":null,"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","ref_index":2,"cited_arxiv_id":"2502.13923","is_internal_anchor":true},{"doi":"","year":null,"title":"Is a 3d-tokenized LLM the key to reliable autonomous driving? CoRR, abs/2405.18361, 2024","work_id":"f6f04e40-cb7f-4d15-b4d7-b0a9b79c2f2a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","work_id":"e2db69c7-ee8a-4cb7-a761-7b8de1dfcf97","ref_index":4,"cited_arxiv_id":"2503.14734","is_internal_anchor":true},{"doi":"","year":null,"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","ref_index":5,"cited_arxiv_id":"2410.24164","is_internal_anchor":true}],"resolved_work":45,"snapshot_sha256":"fbde30c0d11113843533d5bd0d40b2e88d6f96e3cfbdc1395d1312af96d22b92","internal_anchors":22},"formal_canon":{"evidence_count":3,"snapshot_sha256":"63586337561cb13aa62e45ff9c91453416dd1b783f351f272298a207a362121b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}