pith. machine review for the scientific record. sign in

arxiv: 2605.00066 · v1 · submitted 2026-04-30 · 💻 cs.RO

Recognition: unknown

Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:52 UTC · model grok-4.3

classification 💻 cs.RO
keywords autonomous drivingopen-loop evaluationclosed-loop performanceNAVSIMBench2Drivecorrelation analysisPDM ScoreEgo Progress
0
0 comments X

The pith

NAVSIM open-loop PDM Score correlates at ρ=0.90 with closed-loop Bench2Drive Driving Score but shows ranking inversions and can be matched by a simpler three-metric version.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper checks whether newer safety-aware open-loop metrics from NAVSIM can forecast actual closed-loop driving performance in Bench2Drive. Pairing published results yields eight methods with complete data, where the overall PDM Score shows strong positive yet non-monotonic correlation with the closed-loop Driving Score and some clear ranking reversals. Among the sub-metrics, Ego Progress turns out to be the single best predictor of closed-loop success, outperforming the collision metric. A reduced three-metric combination delivers the same Spearman correlation of 0.90 as the full five-metric score. The study also notes that heavy safety focus at the expense of progress ranks high in open-loop but triggers timeout penalties in closed-loop.

Core claim

Compiling paired NAVSIM sub-metrics and Bench2Drive scores for eight methods shows the aggregate PDM Score correlates positively with closed-loop Driving Score at Spearman ρ=0.90 but non-monotonically, with ranking inversions. Ego Progress is the strongest single sub-metric predictor, exceeding the safety-critical No Collision metric. A simpler three-metric formula matches the predictive power of the full five-metric PDM Score on the same n=8 sample. The safety-progress trade-off appears differently across the two regimes, with the snowball effect of accumulating open-loop deviations offered as a candidate mechanism for the residual gap.

What carries the argument

Paired dataset of eight methods' NAVSIM open-loop sub-metrics (including Ego Progress and PDM Score) and Bench2Drive closed-loop Driving Scores, analyzed via Spearman rank correlations and ranking comparisons.

If this is right

  • A three-metric shortcut can be used in place of the full PDM Score for closed-loop ranking with no loss in accuracy on current methods.
  • Planners that maximize safety by minimizing progress in open-loop evaluation tend to underperform in closed-loop due to timeout and slow-driving penalties.
  • Ego Progress should receive higher weight in future open-loop metric design to improve alignment with closed-loop outcomes.
  • Within present state-of-the-art, TTC and Comfort metrics add little marginal information for predicting closed-loop success.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pattern holds on larger samples, benchmark designers could simplify NAVSIM-style scoring to fewer metrics focused on progress.
  • The snowball effect suggests that open-loop metrics might be augmented with explicit models of deviation accumulation to better anticipate closed-loop failures.
  • Extending the pairing exercise to additional closed-loop benchmarks would test whether the observed correlation generalizes beyond Bench2Drive.

Load-bearing premise

The eight methods with complete paired data represent current state-of-the-art planners and that the published open-loop and closed-loop results are directly comparable without hidden differences in simulation setups or evaluation protocols.

What would settle it

A study adding more methods or different benchmark pairs that finds Spearman correlation between PDM Score and Driving Score below 0.7, or where the three-metric formula loses its matching accuracy, would falsify the claimed predictive equivalence.

Figures

Figures reproduced from arXiv: 2605.00066 by Anqing Jiang, Hai Yang, Hao Sun, Shuo Wang, Yang Chen, Yiru Wang, Yuwen Heng.

Figure 1
Figure 1. Figure 1: Can open-loop metrics predict closed-loop driving? [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: NAVSIM PDMS vs. Bench2Drive Driving Score (n = 8, Spearman ρ = 0.90, p = 0.002). SafeDrive (▲) is the primary outlier: it ranks 3rd in PDMS but drops to 5th in DS, due to the safety–progress trade-off analyzed in §5.3. Ranking Inversion: SafeDrive. The most prominent rank￾ing inversion involves SafeDrive. Among the 8 methods, SafeDrive ranks 3rd by PDMS (91.6) but only 5th by DS (66.77)—a two-position drop… view at source ↗
Figure 3
Figure 3. Figure 3: Ego Progress (EP) vs. Bench2Drive Driving Score for the n=8 methods with complete sub-metric data (Spearman ρ = 0.83). All methods confirm consistent model configurations across both benchmarks. EP exhibits the strongest correlation among individual sub-metrics. • In Bench2Drive (closed-loop): Safety is a necessary con￾dition (collisions are penalized), but route completion is the dominant factor in DS. Ov… view at source ↗
read the original abstract

Open-loop evaluation offers fast, reproducible assessment of autonomous driving planners, but its ability to predict real closed-loop driving performance remains questionable. Prior work has shown that traditional open-loop metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE) exhibit no reliable correlation with closed-loop Driving Score. In this paper, we ask whether the more recent, safety-aware open-loop metrics introduced by NAVSIM~v2 can bridge this gap. By systematically cross-referencing published results from 15 state-of-the-art methods across NAVSIM (open-loop) and Bench2Drive (closed-loop), we compile a paired dataset of open-loop sub-metrics and closed-loop performance, yielding 8 methods with complete paired data. Our analysis reveals three key findings: (1) the aggregate NAVSIM PDM Score shows a strong positive but non-monotonic correlation with Bench2Drive Driving Score, with clear ranking inversions; (2) among individual NAVSIM sub-metrics, Ego Progress (EP) is the strongest single predictor of closed-loop success, substantially exceeding the safety-critical collision metric NC; (3) the safety-progress trade-off manifests differently in open-loop and closed-loop: methods that maximize safety at the expense of progress rank highly in NAVSIM but underperform in closed-loop due to timeout and slow-driving penalties. We further demonstrate that a much simpler 3-metric formula matches the predictive power of the full 5-metric PDMS at the same Spearman $\rho{=}0.90$ on our paired sample of $n{=}8$ methods, suggesting that within current state-of-the-art methods -- where TTC and Comfort approach saturation -- these two sub-metrics add little marginal information for closed-loop ranking. Additionally, we identify the snowball effect -- where small open-loop deviations compound into closed-loop failures -- as a candidate mechanism for the residual gap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript conducts a cross-benchmark correlation study between open-loop NAVSIM metrics (including the aggregate PDM Score and its sub-metrics) and closed-loop Bench2Drive Driving Scores. Using published results from 15 SOTA methods, it compiles paired data for 8 methods and reports a strong positive but non-monotonic Spearman correlation (ρ=0.90) between PDM Score and Driving Score with ranking inversions, identifies Ego Progress (EP) as the strongest single open-loop predictor, and shows that a post-hoc 3-metric formula matches the predictive power of the full 5-metric PDMS on the same n=8 sample. It attributes residual gaps to a snowball effect and notes that TTC and Comfort appear saturated within current SOTA.

Significance. If the correlations and simplification hold on larger, independently validated samples, the work would be significant for autonomous driving evaluation by providing concrete evidence that full open-loop suites like PDMS may be overparameterized for closed-loop prediction and by highlighting non-monotonicity and the open-to-closed-loop gap. The transparency in reporting specific Spearman values and explicit ranking inversions is a strength that enables falsifiable follow-up.

major comments (3)
  1. [Paired dataset construction and results] Paired data compilation and correlation analysis: All headline results (PDM Score vs. Driving Score ρ=0.90, EP as top predictor, non-monotonicity with inversions, and the 3-metric formula) are computed exclusively on the n=8 methods with complete paired data. No bootstrap intervals, statistical significance tests, or sensitivity analysis to outliers are reported, making the claims vulnerable to small-sample artifacts.
  2. [Simplified metric formula and ablation] 3-metric formula: The selection of which two sub-metrics (TTC, Comfort) to drop and the demonstration that the resulting formula matches full PDMS at ρ=0.90 both occur on the identical n=8 sample, introducing circularity. The claim that these metrics 'approach saturation' is therefore an in-sample observation without hold-out validation or external confirmation.
  3. [Cross-benchmark methodology] Comparability assumption: The analysis treats published open-loop and closed-loop results as directly comparable, but does not address potential hidden differences in simulation setups, evaluation protocols, or method-specific reporting choices that could affect the observed correlations.
minor comments (1)
  1. [Abstract] Clarify early (e.g., in the abstract or introduction) that while 15 methods are referenced, all quantitative claims rest on the subset of 8 with complete data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which highlight important limitations in statistical rigor and methodological assumptions. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Paired dataset construction and results] Paired data compilation and correlation analysis: All headline results (PDM Score vs. Driving Score ρ=0.90, EP as top predictor, non-monotonicity with inversions, and the 3-metric formula) are computed exclusively on the n=8 methods with complete paired data. No bootstrap intervals, statistical significance tests, or sensitivity analysis to outliers are reported, making the claims vulnerable to small-sample artifacts.

    Authors: We acknowledge the small n=8 sample as a core limitation due to the scarcity of published paired results across benchmarks. In the revision, we will add bootstrap resampling (1000 iterations) to report 95% confidence intervals for the Spearman ρ values, including for PDM Score vs. Driving Score and the sub-metrics. We will also include leave-one-out sensitivity analysis to evaluate the stability of EP as the top predictor and the observed ranking inversions. These additions will be presented alongside an explicit discussion of small-sample caveats, framing the results as preliminary evidence rather than definitive claims. revision: yes

  2. Referee: [Simplified metric formula and ablation] 3-metric formula: The selection of which two sub-metrics (TTC, Comfort) to drop and the demonstration that the resulting formula matches full PDMS at ρ=0.90 both occur on the identical n=8 sample, introducing circularity. The claim that these metrics 'approach saturation' is therefore an in-sample observation without hold-out validation or external confirmation.

    Authors: The referee rightly points out the circularity in both selecting and validating the 3-metric formula on the same sample. We will revise the text to present this formula strictly as a post-hoc exploratory observation derived from saturation patterns visible in the current SOTA data distributions. We will report the full set of individual sub-metric correlations with Driving Score for transparency, allowing readers to evaluate contributions independently. The saturation statement will be qualified as an in-sample observation specific to existing methods, with a call for future hold-out validation on expanded datasets. No claim of generalizability beyond n=8 will be retained. revision: partial

  3. Referee: [Cross-benchmark methodology] Comparability assumption: The analysis treats published open-loop and closed-loop results as directly comparable, but does not address potential hidden differences in simulation setups, evaluation protocols, or method-specific reporting choices that could affect the observed correlations.

    Authors: We agree that unaddressed differences in simulation setups, scenario coverage, or reporting choices could confound the correlations. In revision, we will add a dedicated limitations subsection discussing these factors, noting that all methods were selected based on their use of official NAVSIM and Bench2Drive evaluation protocols as published. We will emphasize that our conclusions are conditional on these standard benchmarks and highlight the snowball effect as one mechanism for residual discrepancies. Despite potential confounders, the strength of the observed correlation (ρ=0.90) still provides useful evidence, but we will frame it more cautiously. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper performs an empirical cross-benchmark analysis by compiling published results from external methods on NAVSIM and Bench2Drive, computing Spearman correlations and identifying predictors such as Ego Progress directly from those data. The observation that a 3-metric subset achieves the same ρ=0.90 on the n=8 paired sample is an in-sample comparison rather than a derivation that reduces to its own inputs by construction. No equations are shown to be self-referential, no parameters are fitted and then relabeled as predictions, and no load-bearing claims rest on self-citations or imported uniqueness theorems. The central findings rely on external published benchmarks and are therefore self-contained against those independent sources.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on the assumption that published benchmark scores from different papers are comparable and that standard Spearman correlation is appropriate for ranking prediction. No new physical or algorithmic axioms are introduced.

axioms (2)
  • domain assumption Published open-loop and closed-loop scores from different papers can be directly paired and compared without protocol mismatches
    The study compiles results across NAVSIM and Bench2Drive without re-running experiments or verifying identical simulation conditions.
  • standard math Spearman rank correlation is a valid measure of predictive power for closed-loop success
    Used to quantify correlation between PDM Score and Driving Score.

pith-pipeline@v0.9.0 · 5667 in / 1736 out tokens · 36948 ms · 2026-05-09T20:52:46.480691+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 18 canonical work pages · 3 internal anchors

  1. [1]

    CARLA: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InConference on Robot Learning (CoRL), 2017

  2. [2]

    Parting with misconcep- tions about learning-based vehicle motion planning

    Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with misconceptions about learning- based vehicle motion planning. InConference on Robot Learn- ing (CoRL), 2023. arXiv:2306.07962

  3. [3]

    Is ego status all you need for open-loop end-to-end autonomous driving? In IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2024

    Zhiqi Li, Zhiding Yu, Shiyi Lan, et al. Is ego status all you need for open-loop end-to-end autonomous driving? In IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2024

  4. [4]

    NA VSIM: Data-driven non- reactive autonomous vehicle simulation and benchmarking

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. NA VSIM: Data-driven non- reactive autonomous vehicle simulation and benchmarking. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

  5. [5]

    Bench2Drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving

    Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2Drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37,

  6. [6]

    nuScenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, et al. nuScenes: A multimodal dataset for autonomous driving. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  7. [7]

    Argoverse 2: Next generation datasets for self-driving perception and fore- casting

    Benjamin Wilson, William Qi, Tanmay Aber, et al. Argoverse 2: Next generation datasets for self-driving perception and fore- casting. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  8. [8]

    Towards learning-based planning: The nuPlan benchmark for real-world autonomous driving

    Napat Karnchanachari et al. Towards learning-based planning: The nuPlan benchmark for real-world autonomous driving. In IEEE International Conference on Robotics and Automation (ICRA), 2024

  9. [9]

    Bench2Drive-VL: Closed-loop VLM evalua- tion for autonomous driving.arXiv preprint arXiv:2604.01259, 2026

    Xiaosong Jia et al. Bench2Drive-VL: Closed-loop VLM evalua- tion for autonomous driving.arXiv preprint arXiv:2604.01259, 2026

  10. [10]

    Scaling Laws of Mo- tion Forecasting and Planning – Technical Report, 2025

    Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Sergio Casas, et al. Scaling laws of motion forecasting and plan- ning – technical report.arXiv preprint arXiv:2506.08228, 2025

  11. [11]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, et al. Planning-oriented autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2023

  12. [12]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, et al. Hydra- MDP: End-to-end multimodal planning with multi-target hydra- distillation.arXiv preprint arXiv:2406.06978, 2024

  13. [13]

    DiffusionDrive: Truncated diffusion model for end-to-end au- tonomous driving

    Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, et al. DiffusionDrive: Truncated diffusion model for end-to-end au- tonomous driving. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2025

  14. [14]

    Hydra-next: Robust closed-loop driving with open-loop training.arXiv preprint arXiv:2503.12030, 2025

    Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, and Jose M Alvarez. Hydra-NeXt: Robust closed-loop driv- ing with open-loop training.arXiv preprint arXiv:2503.12030, 2025

  15. [15]

    GoalFlow: Goal-driven flow matching for multimodal tra- jectories generation in end-to-end autonomous driving

    Zebin Xing, Xinhao Zhang, Yanjun Hu, Bo Jiang, et al. GoalFlow: Goal-driven flow matching for multimodal tra- jectories generation in end-to-end autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2025

  16. [16]

    2603.29163 , archivePrefix =

    Wenchao Sun, Xuewu Lin, Keyu Chen, Zixiang Pei, Xiang Li, Yining Shi, and Sifa Zheng. SparseDriveV2: Scoring is all you need for end-to-end autonomous driving.arXiv preprint arXiv:2603.29163, 2026

  17. [17]

    SafeDrive: Fine-grained safety reasoning for end-to-end driving in a sparse world.arXiv preprint arXiv:2602.18887, 2026

    Jungho Kim, Jiyong Oh, Seunghoon Yu, Hongjae Shin, Donghyuk Kwak, and Jun Won Choi. SafeDrive: Fine-grained safety reasoning for end-to-end driving in a sparse world.arXiv preprint arXiv:2602.18887, 2026. CVPR 2026

  18. [18]

    Drivetransformer: Unified transformer for scalable end-to-end autonomous driving.arXiv preprint arXiv:2503.07656,

    Xiaosong Jia, Junqi You, Zhiyuan Zhang, and Junchi Yan. Driv- eTransformer: Unified transformer for scalable end-to-end au- tonomous driving. InInternational Conference on Learning Representations (ICLR), 2025. arXiv:2503.07656

  19. [19]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Bo Jiang, Shaoyu Chen, Hao Gao, Bencheng Liao, Qian Zhang, Wenyu Liu, and Xinggang Wang. V ADv2: End-to-end vector- ized autonomous driving via probabilistic planning. InInterna- tional Conference on Learning Representations (ICLR), 2026. arXiv:2402.13243

  20. [20]

    End-to-end driving with online tra- jectory evaluation via bev world model.arXiv preprint arXiv:2504.01941, 2025

    Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online tra- jectory evaluation via BEV world model.arXiv preprint arXiv:2504.01941, 2025

  21. [21]

    arXiv preprint arXiv:2506.06659 (2025)

    Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. DriveSuprim: Towards pre- cise trajectory selection for end-to-end planning.arXiv preprint arXiv:2506.06659, 2025. AAAI 2026

  22. [22]

    Trajectory-guided control prediction for end- to-end autonomous driving: A simple yet strong baseline

    Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end- to-end autonomous driving: A simple yet strong baseline. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 2022

  23. [23]

    V AD: Vectorized scene representation for efficient autonomous driving

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xing- gang Wang. V AD: Vectorized scene representation for efficient autonomous driving. InIEEE/CVF International Conference on Computer Vision (ICCV), 2023

  24. [24]

    Think Twice before Driving: To- wards scalable decoders for end-to-end autonomous driving

    Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, and Hongyang Li. Think Twice before Driving: To- wards scalable decoders for end-to-end autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2023

  25. [25]

    DriveAdapter: Breaking the coupling bar- rier of perception and planning in end-to-end autonomous driv- ing

    Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, and Junchi Yan. DriveAdapter: Breaking the coupling bar- rier of perception and planning in end-to-end autonomous driv- ing. InIEEE/CVF International Conference on Computer Vision (ICCV), 2023

  26. [26]

    Bridging past and future: End-to-end autonomous driving with historical pre- diction and planning

    Bozhou Zhang, Nan Song, Xin Jin, and Li Zhang. Bridging past and future: End-to-end autonomous driving with historical pre- diction and planning. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2025. arXiv:2503.14182

  27. [27]

    Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.arXiv preprint arXiv:2509.17940, 2025

    Shuyao Shang, Yuntao Chen, Yuqi Wang, Yingyan Li, and Zhaoxiang Zhang. DriveDPO: Policy learning via safety DPO for end-to-end autonomous driving. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2025. arXiv:2509.17940

  28. [28]

    Simlingo: Vision-only closed-loop autonomous driving with language-action alignment.arXiv preprint arXiv:2503.09594, 2025

    Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. SimLingo: Vision-only closed-loop autonomous driving with language-action alignment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. arXiv:2503.09594

  29. [29]

    HiP-AD: Hierarchical and multi- granularity planning with deformable attention for autonomous driving in a single decoder.arXiv preprint arXiv:2503.08612, 2025

    Yingqi Tang, Zhuoran Xu, Zhaotie Meng, and Erkang Cheng. HiP-AD: Hierarchical and multi-granularity planning with de- formable attention for autonomous driving in a single decoder. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. arXiv:2503.08612. 9

  30. [30]

    DiffRefiner: Coarse to fine trajectory planning via diffusion re- finement with semantic interaction for end-to-end autonomous driving.arXiv preprint arXiv:2511.17150, 2025

    Liuhan Yin, Runkun Ju, Guodong Guo, and Erkang Cheng. DiffRefiner: Coarse to fine trajectory planning via diffusion re- finement with semantic interaction for end-to-end autonomous driving.arXiv preprint arXiv:2511.17150, 2025. AAAI 2026

  31. [31]

    RAP: 3D rasterization augmented end-to-end planning.arXiv preprint arXiv:2510.04333, 2025

    Lan Feng, Yang Gao, Eloi Zablocki, Quanyi Li, Wuyang Li, Sichao Liu, Matthieu Cord, and Alexandre Alahi. RAP: 3D rasterization augmented end-to-end planning.arXiv preprint arXiv:2510.04333, 2025

  32. [32]

    LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

    Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, and Kashyap Chitta. LEAD: Minimizing learner-expert asymmetry in end-to-end driving. arXiv preprint arXiv:2512.20563, 2025. CVPR 2026. 10