pith. sign in

arxiv: 2605.19771 · v1 · pith:SAD3A72Knew · submitted 2026-05-19 · 💻 cs.RO · cs.CV

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

Pith reviewed 2026-05-20 05:24 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords imitation learningautonomous drivinghard negativesflow matchingtrajectory planningsafety boundariesend-to-end drivingfailure-aware learning
0
0 comments X

The pith

Imitation learning for end-to-end autonomous driving improves safety by explicitly training on hard negative trajectories that are close to expert paths but lead to failure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning minimizes distance to expert trajectories but assumes spatial closeness guarantees safety, even though nearly identical paths can produce safe recovery or collisions. The paper proposes BeyondDrive to generate safety-critical negative trajectories via flow matching, sample them diversely, and apply a repulsive distance loss that pulls predictions toward experts while pushing them away from negatives. This creates explicit safety boundaries in trajectory space rather than relying on geometric proximity alone. A reader cares because the mismatch between imitation objectives and real safety outcomes limits current end-to-end systems in deployment. If correct, models can discriminate recoverable paths from dangerous ones without needing exhaustive real-world failure data.

Core claim

The authors claim that jointly learning from successful expert demonstrations and synthesized hard negative trajectories, produced by a flow matching generator with diversity-aware sampling, allows a repulsive distance loss to establish discriminative safety boundaries, addressing the objective mismatch where trajectories with similar imitation losses yield different safety results.

What carries the argument

The Repulsive Distance Loss, which attracts model outputs toward expert trajectories while repelling them from safety-critical yet expert-proximate negative trajectories generated by flow matching.

If this is right

  • The framework generalizes across uni-modal and multi-modal autonomous driving planners.
  • It demonstrates zero-shot transfer to additional benchmarks beyond the primary evaluation set.
  • Models learn explicit distinctions between safe and unsafe behaviors instead of depending solely on spatial closeness to experts.
  • Diversity-aware sampling during negative generation improves coverage of varied failure modes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Synthesizing negatives could reduce reliance on collecting rare real-world crash data for training.
  • The repulsive mechanism might extend to other imitation learning settings where positive examples dominate but boundary cases determine reliability.
  • Closed-loop gains may depend on how well the flow matching distribution matches the actual error distribution of the deployed planner.

Load-bearing premise

The generated negative trajectories accurately represent real-world failure modes that the model would encounter during actual deployment.

What would settle it

Measure whether the trained model shows a lower collision rate than the baseline on closed-loop scenarios that include the synthesized negative trajectories, while keeping comparable expert imitation error on normal driving cases.

Figures

Figures reproduced from arXiv: 2605.19771 by Guang Chen, Hangjun Ye, Haochen Tian, Junli Wang, Kun Ma, Long Chen, Qichao Zhang, Xueyi Liu, Zebin Xing, Zhihua Hua.

Figure 1
Figure 1. Figure 1: Learning safe autonomous driving from hard negatives. (a) Uni-modal and multi-modal end-to-end driving models learn by imitating expert demonstrations, yet they lack an understanding of "what constitutes poor imitation." (b) We generate unsafe hard negatives close to expert via flow matching, guiding models to avoid such behaviors. (c) Our approach yields significant performance improvements on reactive (E… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the BeyondDrive framework. In Stage 1, classifier-free guidance and noise standard deviation scaling are employed to generate diverse trajectory candidates near expert demonstrations, followed by safety-aware and distance-aware filtering to construct hard negative samples that are unsafe yet expert-proximate. In Stage 2, the imitation learning loss and repulsive distance loss jointly optimize t… view at source ↗
Figure 3
Figure 3. Figure 3: Case studies on the navtest benchmark. Left: front-view camera images. Middle: BEV scenes with expert demonstrations and LTF* planned trajectories (with scores). Right: expert demonstrations, LTFv7 planned trajectories and negative samples. as the world model method WoTE. In all three methods, simply by adding one line of training loss code, we achieved PDMS gains of +1.1, +1.3, and +0.9 respectively, demo… view at source ↗
Figure 4
Figure 4. Figure 4: Impact of λrd/λimi on model perfor￾mance. Case Studies. Specifically, we present three representative scenarios in [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Zero-value counts of hard negative samples. (b) Error distribution between hard negative samples and expert demonstrations. (c) Metric correlations of hard neg￾ative samples. 10 Negative Samples Analysis We present a detailed analysis of the characteristics of the generated hard neg￾ative samples. As shown in [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Time and resources. As shown in [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case studies on the HuGSIM benchmark. The ego vehicle avoids a truck, over￾takes a bus, and merges into the main road in a closed-loop environment [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case studies on the HUGSIM benchmark. The ego vehicle passes a pedestrian crossing, maneuvers around a stationary car on the roadside, and returns to its current lane in a closed-loop environment [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A failure case on the HUGSIM benchmark. In a meeting scenario, the oncoming vehicle swerves into the ego lane, and the ego vehicle fails to avoid the collision [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of planning performance in a straight driving scenario [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of planning performance in a left-turn scenario [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of planning performance in a right-turn scenario [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
read the original abstract

Existing imitation learning methods for end-to-end autonomous driving predominantly learn from successful demonstrations by minimizing geometric deviations from expert trajectories. This paradigm implicitly assumes that spatial proximity implies behavioral safety, leading to a critical objective mismatch: trajectories with nearly identical imitation losses may exhibit drastically different safety outcomes, where one remains recoverable while the other results in collision. To address this limitation, we propose BeyondDrive, a failure-aware imitation learning framework that jointly learns from successful and failed driving behaviors. First, we introduce a flow matching-based negative trajectory generator that synthesizes safety-critical yet expert-proximate trajectories, enabling explicit modeling of safety asymmetry. Second, we develop a diversity-aware sampling strategy that mitigates mode collapse and improves coverage of diverse failure modes during negative trajectory generation. Third, we propose a Repulsive Distance Loss that simultaneously attracts predictions toward expert demonstrations while repelling them from hard negative trajectories, thereby establishing discriminative safety boundaries in trajectory space. Applied to the uni-modal baseline Latent TransFuser, BeyondDrive achieves 89.7 PDMS on the NAVSIMv1 closed-loop benchmark, outperforming prior state-of-the-art methods. Moreover, BeyondDrive generalizes effectively across different autonomous driving architectures, including multi-modal planners, and further demonstrates strong zero-shot transferability on the HUGSIM benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces BeyondDrive, a failure-aware imitation learning framework for end-to-end autonomous driving. It augments standard imitation from expert trajectories with a flow matching-based negative trajectory generator that produces safety-critical yet expert-proximate failures, a diversity-aware sampling strategy to avoid mode collapse, and a Repulsive Distance Loss that attracts predictions to experts while repelling them from the hard negatives. Applied to the Latent TransFuser baseline, the method reports 89.7 PDMS on the NAVSIMv1 closed-loop benchmark (outperforming prior SOTA), effective generalization to multi-modal planners, and strong zero-shot transfer to the HUGSIM benchmark.

Significance. If the generated negatives faithfully capture real-world failure mode distributions, the approach directly addresses the objective mismatch between geometric imitation loss and actual safety outcomes, offering a principled way to learn discriminative safety boundaries. The reported benchmark gains, cross-architecture generalization, and zero-shot transfer results would represent a meaningful empirical advance in safe end-to-end driving if the central assumption about negative trajectory quality is substantiated with quantitative checks.

major comments (2)
  1. [Abstract] Abstract: The claim that the flow matching-based negative trajectory generator produces 'safety-critical yet expert-proximate trajectories' that accurately represent real-world failure modes is load-bearing for the safety improvement and the 89.7 PDMS result. No quantitative validation (e.g., Wasserstein distance to logged near-miss trajectories, distribution overlap metrics, or human realism ratings) is referenced to confirm that these synthetics occupy the same regions of trajectory space as actual deployment failures rather than a narrow subset of kinematic violations.
  2. [Experiments] Experiments (benchmark results and generalization claims): The performance gains and cross-architecture / zero-shot transfer assertions would be more convincing with ablations that isolate the repulsive loss contribution from the base model, negative generator, and diversity sampling. Absence of error bars, multiple random seeds, or statistical significance tests leaves the reliability of the 89.7 PDMS score and generalization results unclear.
minor comments (2)
  1. [Method] Clarify the precise mathematical definition of the Repulsive Distance Loss (including any weighting coefficients) with an explicit equation to support reproducibility.
  2. [Figures] Ensure trajectory visualizations in figures clearly label expert vs. negative samples and illustrate the diversity achieved by the sampling strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the flow matching-based negative trajectory generator produces 'safety-critical yet expert-proximate trajectories' that accurately represent real-world failure modes is load-bearing for the safety improvement and the 89.7 PDMS result. No quantitative validation (e.g., Wasserstein distance to logged near-miss trajectories, distribution overlap metrics, or human realism ratings) is referenced to confirm that these synthetics occupy the same regions of trajectory space as actual deployment failures rather than a narrow subset of kinematic violations.

    Authors: We agree that explicit quantitative validation of the negative trajectories would strengthen the central claim. The generator is constructed to produce expert-proximate failures by conditioning the flow matching process on expert states and applying safety-critical perturbations that remain within a bounded deviation from the expert trajectory (as detailed in Section 3.2). However, we acknowledge the absence of direct distributional comparisons in the current manuscript. In the revised version, we will add quantitative checks including Wasserstein distance and maximum mean discrepancy between the generated negatives and logged near-miss trajectories from the NAVSIM dataset, plus overlap metrics, to demonstrate that the synthetics align with real deployment failure modes rather than only simple kinematic violations. revision: yes

  2. Referee: [Experiments] Experiments (benchmark results and generalization claims): The performance gains and cross-architecture / zero-shot transfer assertions would be more convincing with ablations that isolate the repulsive loss contribution from the base model, negative generator, and diversity sampling. Absence of error bars, multiple random seeds, or statistical significance tests leaves the reliability of the 89.7 PDMS score and generalization results unclear.

    Authors: We concur that additional ablations and statistical reporting would enhance the reliability of the results. We will expand the experimental section with a dedicated ablation study that isolates the repulsive loss, the negative generator, and the diversity-aware sampling strategy, each added incrementally to the Latent TransFuser baseline. We will also rerun the primary experiments across multiple random seeds (minimum of three) and report means with standard deviations. Statistical significance tests (paired t-tests against baselines) will be included for the 89.7 PDMS score and the generalization/transfer results to address concerns about reliability. revision: yes

Circularity Check

0 steps flagged

Empirical training framework with independent benchmark validation

full rationale

The paper presents BeyondDrive as a practical imitation learning method that augments standard training with a flow-matching negative generator, diversity sampling, and a repulsive distance loss. These components are introduced as design choices, trained end-to-end, and evaluated on external closed-loop benchmarks (NAVSIMv1, HUGSIM). No equation or claim reduces a reported performance metric or safety boundary to a fitted parameter by construction, nor does any load-bearing premise rest on a self-citation whose content is itself unverified. The derivation chain is therefore self-contained against the reported empirical results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that synthesized negatives are representative of real safety failures and that the repulsive loss creates meaningful discriminative boundaries; limited information available from abstract only.

free parameters (1)
  • loss weighting coefficients
    The repulsive distance loss and attraction terms likely require tuned weights that are not specified in the abstract.
axioms (1)
  • domain assumption Synthesized negative trajectories via flow matching accurately capture safety-critical scenarios.
    The method depends on this to establish safety asymmetry without post-hoc validation details in the abstract.

pith-pipeline@v0.9.0 · 5785 in / 1405 out tokens · 44404 ms · 2026-05-20T05:24:46.739082+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 9 internal anchors

  1. [1]

    In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

    Albergo, M.S., Vanden-Eijnden, E.: Building normalizing flows with stochastic interpolants. In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

  2. [2]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020) 11

  3. [3]

    NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

    Caesar, H., Kabzan, J., Tan, K.S., Fong, W.K., Wolff, E., Lang, A., Fletcher, L., Beijbom, O., Omari, S.: nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810 (2021) 9

  4. [4]

    In: 9th Annual Conference on Robot Learning 9

    Cao, W., Hallgarten, M., Li, T., Dauner, D., Gu, X., Wang, C., Miron, Y., Aiello, M., Li, H., Gilitschenski, I., et al.: Pseudo-simulation for autonomous driving. In: 9th Annual Conference on Robot Learning 9

  5. [5]

    Advances in neural information processing systems31(2018) 5

    Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. Advances in neural information processing systems31(2018) 5

  6. [6]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Chen,S.,Jiang,B.,Gao,H.,Liao,B.,Xu,Q.,Zhang,Q.,Huang,C.,Liu,W.,Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024) 2, 4, 10

  7. [7]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Chen, Y., Wang, Y., Zhang, Z.: Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26890–26900 (2025) 10

  8. [8]

    Flow matching in latent space

    Dao, Q., Phung, H., Nguyen, B., Tran, A.: Flow matching in latent space. arXiv preprint arXiv:2307.08698 (2023) 5

  9. [9]

    Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

    Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al.: Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

  10. [10]

    IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

    Feng, R., Xi, N., Chu, D., Wang, R., Deng, Z., Wang, A., Lu, L., Wang, J., Huang, Y.: Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving. IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

  11. [11]

    In: International Conference on Multimedia Modeling

    Gan, W., Dao, M.S., Zettsu, K.: Drive-clip: Cross-modal contrastive safety-critical driving scenario representation learning and zero-shot driving risk analysis. In: International Conference on Multimedia Modeling. pp. 82–97. Springer (2024) 4

  12. [12]

    arXiv preprint arXiv:2603.14972 (2026) 4

    Gao, Y., Liu, D., Zhang, Q., Zheng, Y., Tian, H., Li, G., Ye, H., Chen, L., Ding, D.W.,Zhao,D.:Learningfrommistakes:Post-trainingfordrivingvlawithtakeover data. arXiv preprint arXiv:2603.14972 (2026) 4

  13. [13]

    IEEE Robotics and Automation Letters (2026) 4

    Gao, Y., Zhang, Q., Liu, D., Xia, Z., Li, G., Ma, K., Chen, G., Ye, H., Chen, L., Ding, D.W., et al.: Perlad: Towards enhanced closed-loop end-to-end autonomous driving with pseudo-simulation-based reinforcement learning. IEEE Robotics and Automation Letters (2026) 4

  14. [14]

    Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

    Geng, Z., Deng, M., Bai, X., Kolter, Z., He, K.: Mean flows for one-step generative modeling. Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

  15. [15]

    In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

    Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

  16. [16]

    Neurocomputing610, 128645 (2024) 2

    Hu, H., Wang, X., Zhang, Y., Chen, Q., Guan, Q.: A comprehensive survey on contrastive learning. Neurocomputing610, 128645 (2024) 2

  17. [17]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17853– 17862 (2023) 2, 4, 10, 11

  18. [18]

    Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

    Hua, Z., Wang, J., Li, P., Jin, Q., Zhang, B., Sheng, K., Chen, Y., Gan, Z., Ding, W.: Unveiling the surprising efficacy of navigation understanding in end-to-end autonomous driving. arXiv preprint arXiv:2604.12208 (2026) 10

  19. [19]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang,C.,Wang,X.:Vad:Vectorizedscenerepresentationforefficientautonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8340–8350 (2023) 2, 4, 11

  20. [20]

    In: The Thirteenth International Conference on Learning Representations 10

    Li, Y., Fan, L., He, J., Wang, Y., Chen, Y., Zhang, Z., Tan, T.: Enhancing end-to- end autonomous driving with latent world model. In: The Thirteenth International Conference on Learning Representations 10

  21. [21]

    Li, Y., Wang, Y., Liu, Y., He, J., Fan, L., Zhang, Z.: End-to-end driving with onlinetrajectoryevaluationviabevworldmodel.In:ProceedingsoftheIEEE/CVF International Conference on Computer Vision. pp. 27137–27146 (2025) 2, 3, 4, 10, 12, 20

  22. [22]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Li, Z., Li, K., Wang, S., Lan, S., Yu, Z., Ji, Y., Li, Z., Zhu, Z., Kautz, J., Wu, Z., et al.: Hydra-mdp: End-to-end multimodal planning with multi-target hydra- distillation. arXiv preprint arXiv:2406.06978 (2024) 4, 10

  23. [23]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

    Li, Z., Wang, S., Lan, S., Yu, Z., Wu, Z., Alvarez, J.M.: Hydra-next: Robust closed- loop driving with open-loop training. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 27305–27314 (2025) 10

  24. [24]

    Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end au- tonomousdriving.In:ProceedingsoftheComputerVisionandPatternRecognition Conference. pp. 12037–12047 (2025) 2, 3, 4, 10, 11, 12, 20

  25. [25]

    IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

    Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

  26. [26]

    In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

    Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

  27. [27]

    IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

    Liu, D., Gao, Y., Qian, D., Zhang, Q., Ye, X., Han, J., Zheng, Y., Liu, X., Xia, Z., Ding, D., et al.: Takead: Preference-based post-optimization for end-to-end autonomous driving with expert takeover data. IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

  28. [28]

    arXiv preprint arXiv:2509.23589 (2025) 10

    Liu, S., Chen, W., Li, W., Wang, Z., Yang, L., Huang, J., Zhang, Y., Huang, Z., Cheng, Z., Yang, H.: Bridgedrive: Diffusion bridge policy for closed-loop trajectory planning in autonomous driving. arXiv preprint arXiv:2509.23589 (2025) 10

  29. [29]

    In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

    Liu, X., Gong, C., et al.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

  30. [30]

    In: Conference on Robot Learning

    Liu, X., Zhong, Z., Zhang, Q., Guo, Y., Zheng, Y., Wang, J., Zhao, D., Liu, Y.F., Su, Z., Gao, Y., et al.: Reasonplan: Unified scene prediction and decision reasoning for closed-loop autonomous driving. In: Conference on Robot Learning. pp. 3051–

  31. [31]

    Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    Lu, J., Guan, J., Huang, Z., Li, J., Li, G., Kong, L., Li, Y., Wang, H., Xu, S., Luo, Y., et al.: Onevl: One-step latent reasoning and planning with vision-language explanation. arXiv preprint arXiv:2604.18486 (2026) 4

  32. [32]

    LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

    Nguyen, L., Fauth, M., Jaeger, B., Dauner, D., Igl, M., Geiger, A., Chitta, K.: Lead: Minimizing learner-expert asymmetry in end-to-end driving. arXiv preprint arXiv:2512.20563 (2025) 10

  33. [33]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7077–7087 (2021) 2, 4, 7, 10, 11, 12, 20

  34. [34]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Rao, B., Liao, H., Guan, Y., Wang, C., Wang, B., Zhang, J., Li, Z.: Amd: Adap- tive momentum and decoupled contrastive learning framework for robust long-tail trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28849–28858 (2025) 4

  35. [35]

    In: 2025 International Joint Conference on Neural Networks (IJCNN)

    Rosin, G., Rahman, M.R.U., Vascon, S.: Ecam: A contrastive learning approach to avoid environmental collision in trajectory forecasting. In: 2025 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2025) 4

  36. [36]

    Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.arXiv preprint arXiv:2509.17940, 2025

    Shang, S., Chen, Y., Wang, Y., Li, Y., Zhang, Z.: Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving. arXiv preprint arXiv:2509.17940 (2025) 4

  37. [37]

    Denoising Diffusion Implicit Models

    Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020) 2

  38. [38]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2446–2454 (2020) 11

  39. [39]

    SimScale: Learning to Drive via Real-World Simulation at Scale

    Tian, H., Li, T., Liu, H., Yang, J., Qiu, Y., Li, G., Wang, J., Gao, Y., Zhang, Z., Wang, L., et al.: Simscale: Learning to drive via real-world simulation at scale. arXiv preprint arXiv:2511.23369 (2025) 4

  40. [40]

    arXiv preprint arXiv:2602.20794 (2026) 10

    Wang, J., Li, G., Huang, Z., Dang, C., Ye, H., Han, Y., Chen, L.: Vggdrive: Empow- ering vision-language models with cross-view geometric grounding for autonomous driving. arXiv preprint arXiv:2602.20794 (2026) 10

  41. [41]

    arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

    Wang, J., Liu, X., Zheng, Y., Xing, Z., Li, P., Li, G., Ma, K., Chen, G., Ye, H., Xia, Z., et al.: Meanfuser: Fast one-step multi-modal trajectory generation and adaptive reconstruction via meanflow for end-to-end autonomous driving. arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

  42. [42]

    In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC)

    Wang, J., Zhang, Q.: Consistencydrive: Efficient end-to-end autonomous driv- ing with consistency models. In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC). pp. 57–62. IEEE (2025) 4

  43. [43]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Weng, X., Ivanovic, B., Wang, Y., Wang, Y., Pavone, M.: Para-drive: Parallelized architecture for real-time autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15449–15458 (2024) 10

  44. [44]

    DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

    Xia, T., Li, Y., Zhou, L., Yao, J., Xiong, K., Sun, H., Wang, B., Ma, K., Chen, G., Ye, H., et al.: Drivelaw: Unifying planning and video generation in a latent driving world. arXiv preprint arXiv:2512.23421 (2025) 10

  45. [45]

    In: 2021 IEEE international intelligent transportation systems conference (ITSC)

    Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., et al.: Pandaset: Advanced sensor suite dataset for autonomous driving. In: 2021 IEEE international intelligent transportation systems conference (ITSC). pp. 3095–3101. IEEE (2021) 11 18

  46. [46]

    arXiv preprint arXiv:2512.19133 (2025) 10

    Yang, P., Lu, B., Xia, Z., Han, C., Gao, Y., Zhang, T., Zhan, K., Lang, X., Zheng, Y., Zhang, Q.: Worldrft: Latent world model planning with reinforcement fine- tuning for autonomous driving. arXiv preprint arXiv:2512.19133 (2025) 10

  47. [47]

    Drivesuprim: Towards precise trajectory selection for end-to-end planning.arXiv preprint arXiv:2506.06659, 2025

    Yao, W., Li, Z., Lan, S., Wang, Z., Sun, X., Alvarez, J.M., Wu, Z.: Drivesuprim: Towards precise trajectory selection for end-to-end planning. arXiv preprint arXiv:2506.06659 (2025) 11

  48. [48]

    In: 2024 IEEE In- telligent Vehicles Symposium (IV)

    Zhang, J., Pourkeshavarz, M., Rasouli, A.: Tract: A training dynamics aware con- trastive learning framework for long-tail trajectory prediction. In: 2024 IEEE In- telligent Vehicles Symposium (IV). pp. 3282–3288. IEEE (2024) 4

  49. [49]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Zhang, K., Tang, Z., Hu, X., Pan, X., Guo, X., Liu, Y., Huang, J., Yuan, L., Zhang, Q., Long, X.X., et al.: Epona: Autoregressive diffusion world model for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27220–27230 (2025) 10

  50. [50]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

    Zhao, Z., Fu, T., Wang, Y., Wang, L., Lu, H.: From forecasting to planning: Policy world model for collaborative state-action prediction. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

  51. [51]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

    Zhou, H., Lin, L., Wang, J., Lu, Y., Bai, D., Liu, B., Wang, Y., Geiger, A., Liao, Y.: Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

  52. [52]

    Zhou, Z., Cai, T., Zhao, S.Z., Zhang, Y., Huang, Z., Zhou, B., Ma, J.: Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10 19 Supplementary Materials 8 Model Training and Sampling Algorithm 1:Flow ...