{"total":14,"items":[{"citing_arxiv_id":"2605.31116","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NTR: Neural Token Reconstruction for Scene Token Bottleneck in End-to-End Driving","primary_cat":"cs.CV","submitted_at":"2026-05-29T10:27:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NTR adds a self-distillation masked latent reconstruction objective that uses only scene tokens to reconstruct masked patch features, improving visual representation quality and planning performance in end-to-end autonomous driving.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21061","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Grounding Driving VLA via Inverse Kinematics","primary_cat":"cs.CV","submitted_at":"2026-05-20T11:45:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15120","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning","primary_cat":"cs.RO","submitted_at":"2026-05-14T17:32:18+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14696","ref_index":99,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EponaV2: Driving World Model with Comprehensive Future Reasoning","primary_cat":"cs.CV","submitted_at":"2026-05-14T11:12:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ResAD: Normalized residual trajectory modeling for end-to-end autonomous driving.arXiv preprint arXiv:2510.08562, 2025. [98] Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Au- toVLA: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.arXiv preprint arXiv:2506.13757, 2025. [99] Jialv Zou, Shaoyu Chen, Bencheng Liao, Zhiyu Zheng, Yuehao Song, Lefei Zhang, Qian Zhang, Wenyu Liu, and Xinggang Wang. DiffusionDriveV2: Reinforcement learning-constrained truncated diffusion modeling in end-to-end autonomous driving.arXiv preprint arXiv:2512.07745, 2025. 15"},{"citing_arxiv_id":"2605.12625","ref_index":29,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Driving Intents Amplify Planning-Oriented Reinforcement Learning","primary_cat":"cs.RO","submitted_at":"2026-05-12T18:10:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10904","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems","primary_cat":"cs.RO","submitted_at":"2026-05-11T17:44:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"While this structure is easier to design and maintain, it is limited by error accumulation across modules [38]. In contrast, end-to-end approaches have become the mainstream, which eliminate inter-module error propagation, simplify the system, and enable joint optimization for the final planning task [39- 42]. However, current paradigms primarily focus on single-agent systems and remain constrained by local perception and passive planning [43, 23]. Long-tail scenarios have become a key focus in autonomous driving [24, 44]. However, the potential of multi-agent systems, i.e., perception sharing and decision negotiation, especially in scenarios where existing systems fail, remains underexplored. Cooperative Driving System.Multi-agent systems facilitate seamless information sharing and cooperative driving between CA Vs and intelligent infrastructure, offering a promising paradigm"},{"citing_arxiv_id":"2605.09701","ref_index":63,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DriveFuture: Future-Aware Latent World Models for Autonomous Driving","primary_cat":"cs.CV","submitted_at":"2026-05-10T18:45:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"3 98.3 83.1 -DiffusionDriveV2 [62] 97.7 96.6 99.2 99.8 88.9 97.2 96.0 97.8 91.0 85.5 87.5 VLA-based MethodsDriveWorld-VLA [20] 98.6 99.1 99.6 99.8 87.4 97.9 97.0 97.8 78.6 - 86.8DriveVLA-W0 [21] 98.5 99.1 98.0 99.7 86.4 98.1 93.2 97.9 58.9 - 86.1Recogdrive [16] 98.3 95.2 98.3 99.8 87.1 97.5 96.6 99.5 86.5 - 83.6 World-Model-based MethodsLatent-W AM [63] 98.1 97.3 99.6 99.8 87.7 97.3 97.6 98.1 87.3 - 89.3DriveFuture 98.8 99.1 99.6 99.9 86.6 98.4 96.4 98.3 74.8 86.4 89.9 4 Experiments 4.1 Datasets and Evaluation Metrics We evaluate DriveFuture on the publicNAVSIMbenchmark [1], which is built on OpenScene [50] and nuPlan [51] logs for lightweight planning evaluation. We report results onNAVSIM-v1 navtest[ 1],"},{"citing_arxiv_id":"2605.04470","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies","primary_cat":"cs.LG","submitted_at":"2026-05-06T03:49:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop performance on Bench2Drive across multiple driving architectures.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36: 53728-53741, 2023. [34] Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.arXiv preprint arXiv:2506.08052, 2025. [35] J. Zou, S. Chen, B. Liao, Z. Zheng, Y . Song, L. Zhang, Q. Zhang, W. Liu, and X. Wang. Diffusiondrivev2: Reinforcement learning-constrained truncated diffusion modeling in end-to-end autonomous driving.arXiv preprint arXiv:2512.07745, 2025. [36] R. Yasarla, D. Hegde, S. Han, H.-P. Cheng, Y . Shi, M. Sadeghigooghari, S. Mahajan, A. Bhattacharyya, L."},{"citing_arxiv_id":"2604.19710","ref_index":85,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model","primary_cat":"cs.CV","submitted_at":"2026-04-21T17:34:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SpanVLA reduces action generation latency via flow-matching conditioned on history and improves robustness by training on negative-recovery samples with GRPO and a dedicated reasoning dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15308","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework","primary_cat":"cs.CV","submitted_at":"2026-04-16T17:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"reinforcement learning training school for autonomous driv- ing.arXiv preprint arXiv:2010.09776, 2020. 3 [66] Zewei Zhou, Tianhui Cai, Seth Z Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision- language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.arXiv preprint arXiv:2506.13757, 2025. 2, 3 [67] Jialv Zou, Shaoyu Chen, Bencheng Liao, Zhiyu Zheng, Yue- hao Song, Lefei Zhang, Qian Zhang, Wenyu Liu, and Xing- gang Wang. Diffusiondrivev2: Reinforcement learning- constrained truncated diffusion modeling in end-to-end au- tonomous driving.arXiv preprint arXiv:2512.07745, 2025. 1, 3"},{"citing_arxiv_id":"2604.12656","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving","primary_cat":"cs.RO","submitted_at":"2026-04-14T12:24:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FeaXDrive improves end-to-end autonomous driving by shifting diffusion planning to a trajectory-centric formulation with curvature-constrained training, drivable-area guidance, and GRPO post-training, yielding stronger closed-loop performance and feasibility on NAVSIM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10856","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving","primary_cat":"cs.RO","submitted_at":"2026-04-12T23:37:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"value estimation beyondHis safely ignored, restricting the estimator to evaluate only the high- confidence window bounded by the model's explicit spatial-temporal plan. The implementation of the expected Q-function is detailed in Appendix C.2. As illustrated in Fig. 4(a), during closed-loop execution, rather than defaulting to the best-scoring sequence from the biased open-loop policies [40, 39], the agent selects the candidate trajectory that maximizes ˆQπ. Operating continuously at each decision step, this mechanism dynamically filters out plans that suffer from the biased estimation defined in Section 4.3. 5.2.2 Adaptive Replan Standard test-time scorers typically operate in a memoryless fashion by selecting exclusively from the current candidate setA t proposed by the policy, which often leads to high-frequency action chat-"},{"citing_arxiv_id":"2604.02714","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving","primary_cat":"cs.CV","submitted_at":"2026-04-03T04:14:13+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"498.385.4 64.0 TransFuser [5] 96.9 89.9 97.8 99.7 87.1 95.4 92.798.387.2 76.7 Hydra-MDP++ [26]97.2 97.5 99.4 99.6 83.1 96.5 94.4 98.2 70.9 81.4 DriveSuprem [47] 97.5 96.5 99.4 99.6 88.4 96.6 95.598.377.0 83.1 ARTEMIS [9] 98.3 95.1 98.699.8 81.5 97.4 96.598.3- 83.1 DiffusionDrive [32]98.2 95.9 99.4 99.8 87.5 97.3 96.8 98.387.7 84.5 DiffusionDriveV2 [59]97.7 96.6 99.299.8 88.997.2 96.0 97.891.0 85.5 DriveVLA-W0 [27]98.5 99.198.0 99.7 86.4 98.1 93.2 97.9 58.9 86.1 ExploreVLA 98.896.299.6 99.8 87.198.2 97.8 98.386.8 88.8 GT action L2 error: 0.0 Exploration Bonus: 0.0 Action1 L2 error: 6.3 Exploration Bonus: 0.0 Action2 L2 error: 2.9 Exploration Bonus: 0.32 Fig. 3: Analysis of the exploration bonus."},{"citing_arxiv_id":"2603.09465","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation","primary_cat":"cs.CV","submitted_at":"2026-03-10T10:19:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EvoDriveVLA uses collaborative perception-planning distillation with self-anchor and future-aware teachers to fix perception degradation and long-term instability in driving VLA models, reaching SOTA on nuScenes and NAVSIM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}