Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

Guang Chen; Hangjun Ye; Haochen Tian; Junli Wang; Kun Ma; Long Chen; Qichao Zhang; Xueyi Liu; Zebin Xing; Zhihua Hua

arxiv: 2605.19771 · v1 · pith:SAD3A72Knew · submitted 2026-05-19 · 💻 cs.RO · cs.CV

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

Junli Wang , Zhihua Hua , Xueyi Liu , Zebin Xing , Haochen Tian , Kun Ma , Hangjun Ye , Guang Chen

show 2 more authors

Long Chen Qichao Zhang

This is my paper

Pith reviewed 2026-05-20 05:24 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords imitation learningautonomous drivinghard negativesflow matchingtrajectory planningsafety boundariesend-to-end drivingfailure-aware learning

0 comments

The pith

Imitation learning for end-to-end autonomous driving improves safety by explicitly training on hard negative trajectories that are close to expert paths but lead to failure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning minimizes distance to expert trajectories but assumes spatial closeness guarantees safety, even though nearly identical paths can produce safe recovery or collisions. The paper proposes BeyondDrive to generate safety-critical negative trajectories via flow matching, sample them diversely, and apply a repulsive distance loss that pulls predictions toward experts while pushing them away from negatives. This creates explicit safety boundaries in trajectory space rather than relying on geometric proximity alone. A reader cares because the mismatch between imitation objectives and real safety outcomes limits current end-to-end systems in deployment. If correct, models can discriminate recoverable paths from dangerous ones without needing exhaustive real-world failure data.

Core claim

The authors claim that jointly learning from successful expert demonstrations and synthesized hard negative trajectories, produced by a flow matching generator with diversity-aware sampling, allows a repulsive distance loss to establish discriminative safety boundaries, addressing the objective mismatch where trajectories with similar imitation losses yield different safety results.

What carries the argument

The Repulsive Distance Loss, which attracts model outputs toward expert trajectories while repelling them from safety-critical yet expert-proximate negative trajectories generated by flow matching.

If this is right

The framework generalizes across uni-modal and multi-modal autonomous driving planners.
It demonstrates zero-shot transfer to additional benchmarks beyond the primary evaluation set.
Models learn explicit distinctions between safe and unsafe behaviors instead of depending solely on spatial closeness to experts.
Diversity-aware sampling during negative generation improves coverage of varied failure modes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Synthesizing negatives could reduce reliance on collecting rare real-world crash data for training.
The repulsive mechanism might extend to other imitation learning settings where positive examples dominate but boundary cases determine reliability.
Closed-loop gains may depend on how well the flow matching distribution matches the actual error distribution of the deployed planner.

Load-bearing premise

The generated negative trajectories accurately represent real-world failure modes that the model would encounter during actual deployment.

What would settle it

Measure whether the trained model shows a lower collision rate than the baseline on closed-loop scenarios that include the synthesized negative trajectories, while keeping comparable expert imitation error on normal driving cases.

Figures

Figures reproduced from arXiv: 2605.19771 by Guang Chen, Hangjun Ye, Haochen Tian, Junli Wang, Kun Ma, Long Chen, Qichao Zhang, Xueyi Liu, Zebin Xing, Zhihua Hua.

**Figure 1.** Figure 1: Learning safe autonomous driving from hard negatives. (a) Uni-modal and multi-modal end-to-end driving models learn by imitating expert demonstrations, yet they lack an understanding of "what constitutes poor imitation." (b) We generate unsafe hard negatives close to expert via flow matching, guiding models to avoid such behaviors. (c) Our approach yields significant performance improvements on reactive (E… view at source ↗

**Figure 2.** Figure 2: Overview of the BeyondDrive framework. In Stage 1, classifier-free guidance and noise standard deviation scaling are employed to generate diverse trajectory candidates near expert demonstrations, followed by safety-aware and distance-aware filtering to construct hard negative samples that are unsafe yet expert-proximate. In Stage 2, the imitation learning loss and repulsive distance loss jointly optimize t… view at source ↗

**Figure 3.** Figure 3: Case studies on the navtest benchmark. Left: front-view camera images. Middle: BEV scenes with expert demonstrations and LTF* planned trajectories (with scores). Right: expert demonstrations, LTFv7 planned trajectories and negative samples. as the world model method WoTE. In all three methods, simply by adding one line of training loss code, we achieved PDMS gains of +1.1, +1.3, and +0.9 respectively, demo… view at source ↗

**Figure 4.** Figure 4: Impact of λrd/λimi on model performance. Case Studies. Specifically, we present three representative scenarios in [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Zero-value counts of hard negative samples. (b) Error distribution between hard negative samples and expert demonstrations. (c) Metric correlations of hard negative samples. 10 Negative Samples Analysis We present a detailed analysis of the characteristics of the generated hard negative samples. As shown in [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Time and resources. As shown in [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Case studies on the HuGSIM benchmark. The ego vehicle avoids a truck, overtakes a bus, and merges into the main road in a closed-loop environment [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Case studies on the HUGSIM benchmark. The ego vehicle passes a pedestrian crossing, maneuvers around a stationary car on the roadside, and returns to its current lane in a closed-loop environment [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: A failure case on the HUGSIM benchmark. In a meeting scenario, the oncoming vehicle swerves into the ego lane, and the ego vehicle fails to avoid the collision [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of planning performance in a straight driving scenario [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of planning performance in a left-turn scenario [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of planning performance in a right-turn scenario [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

read the original abstract

Existing imitation learning methods for end-to-end autonomous driving predominantly learn from successful demonstrations by minimizing geometric deviations from expert trajectories. This paradigm implicitly assumes that spatial proximity implies behavioral safety, leading to a critical objective mismatch: trajectories with nearly identical imitation losses may exhibit drastically different safety outcomes, where one remains recoverable while the other results in collision. To address this limitation, we propose BeyondDrive, a failure-aware imitation learning framework that jointly learns from successful and failed driving behaviors. First, we introduce a flow matching-based negative trajectory generator that synthesizes safety-critical yet expert-proximate trajectories, enabling explicit modeling of safety asymmetry. Second, we develop a diversity-aware sampling strategy that mitigates mode collapse and improves coverage of diverse failure modes during negative trajectory generation. Third, we propose a Repulsive Distance Loss that simultaneously attracts predictions toward expert demonstrations while repelling them from hard negative trajectories, thereby establishing discriminative safety boundaries in trajectory space. Applied to the uni-modal baseline Latent TransFuser, BeyondDrive achieves 89.7 PDMS on the NAVSIMv1 closed-loop benchmark, outperforming prior state-of-the-art methods. Moreover, BeyondDrive generalizes effectively across different autonomous driving architectures, including multi-modal planners, and further demonstrates strong zero-shot transferability on the HUGSIM benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BeyondDrive adds flow-matched hard negatives and a repulsive loss to imitation learning for driving, with a reported lift to 89.7 PDMS, but the negatives' realism remains unverified.

read the letter

The main point is that this paper tries to fix a real flaw in imitation learning for autonomous driving: geometric closeness to expert trajectories does not guarantee safety, since some near-miss paths look almost identical to good ones under standard losses. BeyondDrive generates synthetic negatives with flow matching, samples them for diversity, and adds a repulsive distance term that pulls predictions toward experts while pushing them away from the bad ones. On the Latent TransFuser baseline it reaches 89.7 PDMS on NAVSIMv1 closed-loop and shows some generalization to multi-modal planners plus zero-shot transfer on HUGSIM. That is the concrete result worth noting first. The framing of safety asymmetry through explicit repulsion is a clear step beyond plain behavioral cloning or simple negative sampling. The diversity mechanism is a reasonable addition to keep the negatives from collapsing to one kind of failure. The benchmark numbers are stated directly and the method is applied across a couple of architectures, which gives it some practical flavor. The soft spot is exactly the one the stress-test flags. The flow model is conditioned on expert states to produce proximate negatives, but the abstract gives no quantitative check that these trajectories sit in the same regions as real logged near-misses or recoverable failures. Without something like a distribution distance to actual failure data or human ratings, it is possible the repulsion is only carving out an artificial boundary that does not survive deployment shift. The loss weighting coefficients are also free parameters that could be tuned to the specific benchmark. This work is for researchers already working on end-to-end driving policies who want to add failure awareness without moving to full reinforcement learning. A reader who cares about safety margins in imitation setups will see a usable idea here. The paper has a focused claim, a measurable improvement, and enough structure to deserve referee time rather than a desk reject. I would send it out for review so the negative-generation step and the closed-loop ablations can be examined in detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces BeyondDrive, a failure-aware imitation learning framework for end-to-end autonomous driving. It augments standard imitation from expert trajectories with a flow matching-based negative trajectory generator that produces safety-critical yet expert-proximate failures, a diversity-aware sampling strategy to avoid mode collapse, and a Repulsive Distance Loss that attracts predictions to experts while repelling them from the hard negatives. Applied to the Latent TransFuser baseline, the method reports 89.7 PDMS on the NAVSIMv1 closed-loop benchmark (outperforming prior SOTA), effective generalization to multi-modal planners, and strong zero-shot transfer to the HUGSIM benchmark.

Significance. If the generated negatives faithfully capture real-world failure mode distributions, the approach directly addresses the objective mismatch between geometric imitation loss and actual safety outcomes, offering a principled way to learn discriminative safety boundaries. The reported benchmark gains, cross-architecture generalization, and zero-shot transfer results would represent a meaningful empirical advance in safe end-to-end driving if the central assumption about negative trajectory quality is substantiated with quantitative checks.

major comments (2)

[Abstract] Abstract: The claim that the flow matching-based negative trajectory generator produces 'safety-critical yet expert-proximate trajectories' that accurately represent real-world failure modes is load-bearing for the safety improvement and the 89.7 PDMS result. No quantitative validation (e.g., Wasserstein distance to logged near-miss trajectories, distribution overlap metrics, or human realism ratings) is referenced to confirm that these synthetics occupy the same regions of trajectory space as actual deployment failures rather than a narrow subset of kinematic violations.
[Experiments] Experiments (benchmark results and generalization claims): The performance gains and cross-architecture / zero-shot transfer assertions would be more convincing with ablations that isolate the repulsive loss contribution from the base model, negative generator, and diversity sampling. Absence of error bars, multiple random seeds, or statistical significance tests leaves the reliability of the 89.7 PDMS score and generalization results unclear.

minor comments (2)

[Method] Clarify the precise mathematical definition of the Repulsive Distance Loss (including any weighting coefficients) with an explicit equation to support reproducibility.
[Figures] Ensure trajectory visualizations in figures clearly label expert vs. negative samples and illustrate the diversity achieved by the sampling strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the flow matching-based negative trajectory generator produces 'safety-critical yet expert-proximate trajectories' that accurately represent real-world failure modes is load-bearing for the safety improvement and the 89.7 PDMS result. No quantitative validation (e.g., Wasserstein distance to logged near-miss trajectories, distribution overlap metrics, or human realism ratings) is referenced to confirm that these synthetics occupy the same regions of trajectory space as actual deployment failures rather than a narrow subset of kinematic violations.

Authors: We agree that explicit quantitative validation of the negative trajectories would strengthen the central claim. The generator is constructed to produce expert-proximate failures by conditioning the flow matching process on expert states and applying safety-critical perturbations that remain within a bounded deviation from the expert trajectory (as detailed in Section 3.2). However, we acknowledge the absence of direct distributional comparisons in the current manuscript. In the revised version, we will add quantitative checks including Wasserstein distance and maximum mean discrepancy between the generated negatives and logged near-miss trajectories from the NAVSIM dataset, plus overlap metrics, to demonstrate that the synthetics align with real deployment failure modes rather than only simple kinematic violations. revision: yes
Referee: [Experiments] Experiments (benchmark results and generalization claims): The performance gains and cross-architecture / zero-shot transfer assertions would be more convincing with ablations that isolate the repulsive loss contribution from the base model, negative generator, and diversity sampling. Absence of error bars, multiple random seeds, or statistical significance tests leaves the reliability of the 89.7 PDMS score and generalization results unclear.

Authors: We concur that additional ablations and statistical reporting would enhance the reliability of the results. We will expand the experimental section with a dedicated ablation study that isolates the repulsive loss, the negative generator, and the diversity-aware sampling strategy, each added incrementally to the Latent TransFuser baseline. We will also rerun the primary experiments across multiple random seeds (minimum of three) and report means with standard deviations. Statistical significance tests (paired t-tests against baselines) will be included for the 89.7 PDMS score and the generalization/transfer results to address concerns about reliability. revision: yes

Circularity Check

0 steps flagged

Empirical training framework with independent benchmark validation

full rationale

The paper presents BeyondDrive as a practical imitation learning method that augments standard training with a flow-matching negative generator, diversity sampling, and a repulsive distance loss. These components are introduced as design choices, trained end-to-end, and evaluated on external closed-loop benchmarks (NAVSIMv1, HUGSIM). No equation or claim reduces a reported performance metric or safety boundary to a fitted parameter by construction, nor does any load-bearing premise rest on a self-citation whose content is itself unverified. The derivation chain is therefore self-contained against the reported empirical results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that synthesized negatives are representative of real safety failures and that the repulsive loss creates meaningful discriminative boundaries; limited information available from abstract only.

free parameters (1)

loss weighting coefficients
The repulsive distance loss and attraction terms likely require tuned weights that are not specified in the abstract.

axioms (1)

domain assumption Synthesized negative trajectories via flow matching accurately capture safety-critical scenarios.
The method depends on this to establish safety asymmetry without post-hoc validation details in the abstract.

pith-pipeline@v0.9.0 · 5785 in / 1405 out tokens · 44404 ms · 2026-05-20T05:24:46.739082+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a Repulsive Distance Loss that simultaneously attracts predictions toward expert demonstrations while repelling them from hard negative trajectories, thereby establishing discriminative safety boundaries in trajectory space.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

flow matching-based negative trajectory generator that synthesizes safety-critical yet expert-proximate trajectories

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 9 internal anchors

[1]

In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

Albergo, M.S., Vanden-Eijnden, E.: Building normalizing flows with stochastic interpolants. In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

work page 2023
[2]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020) 11

work page 2020
[3]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Caesar, H., Kabzan, J., Tan, K.S., Fong, W.K., Wolff, E., Lang, A., Fletcher, L., Beijbom, O., Omari, S.: nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810 (2021) 9

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

In: 9th Annual Conference on Robot Learning 9

Cao, W., Hallgarten, M., Li, T., Dauner, D., Gu, X., Wang, C., Miron, Y., Aiello, M., Li, H., Gilitschenski, I., et al.: Pseudo-simulation for autonomous driving. In: 9th Annual Conference on Robot Learning 9

work page
[5]

Advances in neural information processing systems31(2018) 5

Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. Advances in neural information processing systems31(2018) 5

work page 2018
[6]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

Chen,S.,Jiang,B.,Gao,H.,Liao,B.,Xu,Q.,Zhang,Q.,Huang,C.,Liu,W.,Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024) 2, 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen, Y., Wang, Y., Zhang, Z.: Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26890–26900 (2025) 10

work page 2025
[8]

Dao, Q., Phung, H., Nguyen, B., and Tran, A

Dao, Q., Phung, H., Nguyen, B., Tran, A.: Flow matching in latent space. arXiv preprint arXiv:2307.08698 (2023) 5

work page arXiv 2023
[9]

Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al.: Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

work page 2024
[10]

IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

Feng, R., Xi, N., Chu, D., Wang, R., Deng, Z., Wang, A., Lu, L., Wang, J., Huang, Y.: Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving. IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

work page 2025
[11]

In: International Conference on Multimedia Modeling

Gan, W., Dao, M.S., Zettsu, K.: Drive-clip: Cross-modal contrastive safety-critical driving scenario representation learning and zero-shot driving risk analysis. In: International Conference on Multimedia Modeling. pp. 82–97. Springer (2024) 4

work page 2024
[12]

arXiv preprint arXiv:2603.14972 (2026) 4

Gao, Y., Liu, D., Zhang, Q., Zheng, Y., Tian, H., Li, G., Ye, H., Chen, L., Ding, D.W.,Zhao,D.:Learningfrommistakes:Post-trainingfordrivingvlawithtakeover data. arXiv preprint arXiv:2603.14972 (2026) 4

work page arXiv 2026
[13]

IEEE Robotics and Automation Letters (2026) 4

Gao, Y., Zhang, Q., Liu, D., Xia, Z., Li, G., Ma, K., Chen, G., Ye, H., Chen, L., Ding, D.W., et al.: Perlad: Towards enhanced closed-loop end-to-end autonomous driving with pseudo-simulation-based reinforcement learning. IEEE Robotics and Automation Letters (2026) 4

work page 2026
[14]

Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

Geng, Z., Deng, M., Bai, X., Kolter, Z., He, K.: Mean flows for one-step generative modeling. Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

work page 2026
[15]

In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

work page 2021
[16]

Neurocomputing610, 128645 (2024) 2

Hu, H., Wang, X., Zhang, Y., Chen, Q., Guan, Q.: A comprehensive survey on contrastive learning. Neurocomputing610, 128645 (2024) 2

work page 2024
[17]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17853– 17862 (2023) 2, 4, 10, 11

work page 2023
[18]

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Hua, Z., Wang, J., Li, P., Jin, Q., Zhang, B., Sheng, K., Chen, Y., Gan, Z., Ding, W.: Unveiling the surprising efficacy of navigation understanding in end-to-end autonomous driving. arXiv preprint arXiv:2604.12208 (2026) 10

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang,C.,Wang,X.:Vad:Vectorizedscenerepresentationforefficientautonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8340–8350 (2023) 2, 4, 11

work page 2023
[20]

In: The Thirteenth International Conference on Learning Representations 10

Li, Y., Fan, L., He, J., Wang, Y., Chen, Y., Zhang, Z., Tan, T.: Enhancing end-to- end autonomous driving with latent world model. In: The Thirteenth International Conference on Learning Representations 10

work page
[21]

Li, Y., Wang, Y., Liu, Y., He, J., Fan, L., Zhang, Z.: End-to-end driving with onlinetrajectoryevaluationviabevworldmodel.In:ProceedingsoftheIEEE/CVF International Conference on Computer Vision. pp. 27137–27146 (2025) 2, 3, 4, 10, 12, 20

work page 2025
[22]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Li, Z., Li, K., Wang, S., Lan, S., Yu, Z., Ji, Y., Li, Z., Zhu, Z., Kautz, J., Wu, Z., et al.: Hydra-mdp: End-to-end multimodal planning with multi-target hydra- distillation. arXiv preprint arXiv:2406.06978 (2024) 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

Li, Z., Wang, S., Lan, S., Yu, Z., Wu, Z., Alvarez, J.M.: Hydra-next: Robust closed- loop driving with open-loop training. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 27305–27314 (2025) 10

work page 2025
[24]

Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end au- tonomousdriving.In:ProceedingsoftheComputerVisionandPatternRecognition Conference. pp. 12037–12047 (2025) 2, 3, 4, 10, 11, 12, 20

work page 2025
[25]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

work page 2022
[26]

In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

work page 2023
[27]

IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

Liu, D., Gao, Y., Qian, D., Zhang, Q., Ye, X., Han, J., Zheng, Y., Liu, X., Xia, Z., Ding, D., et al.: Takead: Preference-based post-optimization for end-to-end autonomous driving with expert takeover data. IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

work page 2025
[28]

arXiv preprint arXiv:2509.23589 (2025) 10

Liu, S., Chen, W., Li, W., Wang, Z., Yang, L., Huang, J., Zhang, Y., Huang, Z., Cheng, Z., Yang, H.: Bridgedrive: Diffusion bridge policy for closed-loop trajectory planning in autonomous driving. arXiv preprint arXiv:2509.23589 (2025) 10

work page arXiv 2025
[29]

In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

Liu, X., Gong, C., et al.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

work page 2023
[30]

In: Conference on Robot Learning

Liu, X., Zhong, Z., Zhang, Q., Guo, Y., Zheng, Y., Wang, J., Zhao, D., Liu, Y.F., Su, Z., Gao, Y., et al.: Reasonplan: Unified scene prediction and decision reasoning for closed-loop autonomous driving. In: Conference on Robot Learning. pp. 3051–

work page
[31]

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Lu, J., Guan, J., Huang, Z., Li, J., Li, G., Kong, L., Li, Y., Wang, H., Xu, S., Luo, Y., et al.: Onevl: One-step latent reasoning and planning with vision-language explanation. arXiv preprint arXiv:2604.18486 (2026) 4

work page internal anchor Pith review Pith/arXiv arXiv 2026
[32]

LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Nguyen, L., Fauth, M., Jaeger, B., Dauner, D., Igl, M., Geiger, A., Chitta, K.: Lead: Minimizing learner-expert asymmetry in end-to-end driving. arXiv preprint arXiv:2512.20563 (2025) 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7077–7087 (2021) 2, 4, 7, 10, 11, 12, 20

work page 2021
[34]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Rao, B., Liao, H., Guan, Y., Wang, C., Wang, B., Zhang, J., Li, Z.: Amd: Adap- tive momentum and decoupled contrastive learning framework for robust long-tail trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28849–28858 (2025) 4

work page 2025
[35]

In: 2025 International Joint Conference on Neural Networks (IJCNN)

Rosin, G., Rahman, M.R.U., Vascon, S.: Ecam: A contrastive learning approach to avoid environmental collision in trajectory forecasting. In: 2025 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2025) 4

work page 2025
[36]

Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.arXiv preprint arXiv:2509.17940, 2025

Shang, S., Chen, Y., Wang, Y., Li, Y., Zhang, Z.: Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving. arXiv preprint arXiv:2509.17940 (2025) 4

work page arXiv 2025
[37]

Denoising Diffusion Implicit Models

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020) 2

work page internal anchor Pith review Pith/arXiv arXiv 2010
[38]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2446–2454 (2020) 11

work page 2020
[39]

SimScale: Learning to Drive via Real-World Simulation at Scale

Tian, H., Li, T., Liu, H., Yang, J., Qiu, Y., Li, G., Wang, J., Gao, Y., Zhang, Z., Wang, L., et al.: Simscale: Learning to drive via real-world simulation at scale. arXiv preprint arXiv:2511.23369 (2025) 4

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

arXiv preprint arXiv:2602.20794 (2026) 10

Wang, J., Li, G., Huang, Z., Dang, C., Ye, H., Han, Y., Chen, L.: Vggdrive: Empow- ering vision-language models with cross-view geometric grounding for autonomous driving. arXiv preprint arXiv:2602.20794 (2026) 10

work page arXiv 2026
[41]

arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

Wang, J., Liu, X., Zheng, Y., Xing, Z., Li, P., Li, G., Ma, K., Chen, G., Ye, H., Xia, Z., et al.: Meanfuser: Fast one-step multi-modal trajectory generation and adaptive reconstruction via meanflow for end-to-end autonomous driving. arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

work page arXiv 2026
[42]

In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC)

Wang, J., Zhang, Q.: Consistencydrive: Efficient end-to-end autonomous driv- ing with consistency models. In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC). pp. 57–62. IEEE (2025) 4

work page 2025
[43]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Weng, X., Ivanovic, B., Wang, Y., Wang, Y., Pavone, M.: Para-drive: Parallelized architecture for real-time autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15449–15458 (2024) 10

work page 2024
[44]

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

Xia, T., Li, Y., Zhou, L., Yao, J., Xiong, K., Sun, H., Wang, B., Ma, K., Chen, G., Ye, H., et al.: Drivelaw: Unifying planning and video generation in a latent driving world. arXiv preprint arXiv:2512.23421 (2025) 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

In: 2021 IEEE international intelligent transportation systems conference (ITSC)

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., et al.: Pandaset: Advanced sensor suite dataset for autonomous driving. In: 2021 IEEE international intelligent transportation systems conference (ITSC). pp. 3095–3101. IEEE (2021) 11 18

work page 2021
[46]

arXiv preprint arXiv:2512.19133 (2025) 10

Yang, P., Lu, B., Xia, Z., Han, C., Gao, Y., Zhang, T., Zhan, K., Lang, X., Zheng, Y., Zhang, Q.: Worldrft: Latent world model planning with reinforcement fine- tuning for autonomous driving. arXiv preprint arXiv:2512.19133 (2025) 10

work page arXiv 2025
[47]

Drivesuprim: Towards precise trajectory selection for end-to-end planning.arXiv preprint arXiv:2506.06659, 2025

Yao, W., Li, Z., Lan, S., Wang, Z., Sun, X., Alvarez, J.M., Wu, Z.: Drivesuprim: Towards precise trajectory selection for end-to-end planning. arXiv preprint arXiv:2506.06659 (2025) 11

work page arXiv 2025
[48]

In: 2024 IEEE In- telligent Vehicles Symposium (IV)

Zhang, J., Pourkeshavarz, M., Rasouli, A.: Tract: A training dynamics aware con- trastive learning framework for long-tail trajectory prediction. In: 2024 IEEE In- telligent Vehicles Symposium (IV). pp. 3282–3288. IEEE (2024) 4

work page 2024
[49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Zhang, K., Tang, Z., Hu, X., Pan, X., Guo, X., Liu, Y., Huang, J., Yuan, L., Zhang, Q., Long, X.X., et al.: Epona: Autoregressive diffusion world model for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27220–27230 (2025) 10

work page 2025
[50]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

Zhao, Z., Fu, T., Wang, Y., Wang, L., Lu, H.: From forecasting to planning: Policy world model for collaborative state-action prediction. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

work page
[51]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

Zhou, H., Lin, L., Wang, J., Lu, Y., Bai, D., Liu, B., Wang, Y., Geiger, A., Liao, Y.: Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

work page 2025
[52]

Zhou, Z., Cai, T., Zhao, S.Z., Zhang, Y., Huang, Z., Zhou, B., Ma, J.: Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10 19 Supplementary Materials 8 Model Training and Sampling Algorithm 1:Flow ...

work page

[1] [1]

In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

Albergo, M.S., Vanden-Eijnden, E.: Building normalizing flows with stochastic interpolants. In: The Eleventh International Conference on Learning Representa- tions, ICLR 2023 5

work page 2023

[2] [2]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020) 11

work page 2020

[3] [3]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Caesar, H., Kabzan, J., Tan, K.S., Fong, W.K., Wolff, E., Lang, A., Fletcher, L., Beijbom, O., Omari, S.: nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810 (2021) 9

work page internal anchor Pith review Pith/arXiv arXiv 2021

[4] [4]

In: 9th Annual Conference on Robot Learning 9

Cao, W., Hallgarten, M., Li, T., Dauner, D., Gu, X., Wang, C., Miron, Y., Aiello, M., Li, H., Gilitschenski, I., et al.: Pseudo-simulation for autonomous driving. In: 9th Annual Conference on Robot Learning 9

work page

[5] [5]

Advances in neural information processing systems31(2018) 5

Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. Advances in neural information processing systems31(2018) 5

work page 2018

[6] [6]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

Chen,S.,Jiang,B.,Gao,H.,Liao,B.,Xu,Q.,Zhang,Q.,Huang,C.,Liu,W.,Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024) 2, 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen, Y., Wang, Y., Zhang, Z.: Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26890–26900 (2025) 10

work page 2025

[8] [8]

Dao, Q., Phung, H., Nguyen, B., and Tran, A

Dao, Q., Phung, H., Nguyen, B., Tran, A.: Flow matching in latent space. arXiv preprint arXiv:2307.08698 (2023) 5

work page arXiv 2023

[9] [9]

Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al.: Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. Advances in Neural Information Processing Systems37, 28706–28719 (2024) 3, 8, 9, 10, 11, 12, 20

work page 2024

[10] [10]

IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

Feng, R., Xi, N., Chu, D., Wang, R., Deng, Z., Wang, A., Lu, L., Wang, J., Huang, Y.: Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving. IEEE Robotics and Automation Letters11(1), 226–233 (2025) 11

work page 2025

[11] [11]

In: International Conference on Multimedia Modeling

Gan, W., Dao, M.S., Zettsu, K.: Drive-clip: Cross-modal contrastive safety-critical driving scenario representation learning and zero-shot driving risk analysis. In: International Conference on Multimedia Modeling. pp. 82–97. Springer (2024) 4

work page 2024

[12] [12]

arXiv preprint arXiv:2603.14972 (2026) 4

Gao, Y., Liu, D., Zhang, Q., Zheng, Y., Tian, H., Li, G., Ye, H., Chen, L., Ding, D.W.,Zhao,D.:Learningfrommistakes:Post-trainingfordrivingvlawithtakeover data. arXiv preprint arXiv:2603.14972 (2026) 4

work page arXiv 2026

[13] [13]

IEEE Robotics and Automation Letters (2026) 4

Gao, Y., Zhang, Q., Liu, D., Xia, Z., Li, G., Ma, K., Chen, G., Ye, H., Chen, L., Ding, D.W., et al.: Perlad: Towards enhanced closed-loop end-to-end autonomous driving with pseudo-simulation-based reinforcement learning. IEEE Robotics and Automation Letters (2026) 4

work page 2026

[14] [14]

Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

Geng, Z., Deng, M., Bai, X., Kolter, Z., He, K.: Mean flows for one-step generative modeling. Advances in Neural Information Processing Systems38, 75460–75482 (2026) 5

work page 2026

[15] [15]

In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 5 16

work page 2021

[16] [16]

Neurocomputing610, 128645 (2024) 2

Hu, H., Wang, X., Zhang, Y., Chen, Q., Guan, Q.: A comprehensive survey on contrastive learning. Neurocomputing610, 128645 (2024) 2

work page 2024

[17] [17]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17853– 17862 (2023) 2, 4, 10, 11

work page 2023

[18] [18]

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Hua, Z., Wang, J., Li, P., Jin, Q., Zhang, B., Sheng, K., Chen, Y., Gan, Z., Ding, W.: Unveiling the surprising efficacy of navigation understanding in end-to-end autonomous driving. arXiv preprint arXiv:2604.12208 (2026) 10

work page internal anchor Pith review Pith/arXiv arXiv 2026

[19] [19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang,C.,Wang,X.:Vad:Vectorizedscenerepresentationforefficientautonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8340–8350 (2023) 2, 4, 11

work page 2023

[20] [20]

In: The Thirteenth International Conference on Learning Representations 10

Li, Y., Fan, L., He, J., Wang, Y., Chen, Y., Zhang, Z., Tan, T.: Enhancing end-to- end autonomous driving with latent world model. In: The Thirteenth International Conference on Learning Representations 10

work page

[21] [21]

Li, Y., Wang, Y., Liu, Y., He, J., Fan, L., Zhang, Z.: End-to-end driving with onlinetrajectoryevaluationviabevworldmodel.In:ProceedingsoftheIEEE/CVF International Conference on Computer Vision. pp. 27137–27146 (2025) 2, 3, 4, 10, 12, 20

work page 2025

[22] [22]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Li, Z., Li, K., Wang, S., Lan, S., Yu, Z., Ji, Y., Li, Z., Zhu, Z., Kautz, J., Wu, Z., et al.: Hydra-mdp: End-to-end multimodal planning with multi-target hydra- distillation. arXiv preprint arXiv:2406.06978 (2024) 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

Li, Z., Wang, S., Lan, S., Yu, Z., Wu, Z., Alvarez, J.M.: Hydra-next: Robust closed- loop driving with open-loop training. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 27305–27314 (2025) 10

work page 2025

[24] [24]

Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end au- tonomousdriving.In:ProceedingsoftheComputerVisionandPatternRecognition Conference. pp. 12037–12047 (2025) 2, 3, 4, 10, 11, 12, 20

work page 2025

[25] [25]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022) 11

work page 2022

[26] [26]

In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Rep- resentations, ICLR 2023 (2023) 2, 5

work page 2023

[27] [27]

IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

Liu, D., Gao, Y., Qian, D., Zhang, Q., Ye, X., Han, J., Zheng, Y., Liu, X., Xia, Z., Ding, D., et al.: Takead: Preference-based post-optimization for end-to-end autonomous driving with expert takeover data. IEEE Robotics and Automation Letters11(2), 1738–1745 (2025) 4

work page 2025

[28] [28]

arXiv preprint arXiv:2509.23589 (2025) 10

Liu, S., Chen, W., Li, W., Wang, Z., Yang, L., Huang, J., Zhang, Y., Huang, Z., Cheng, Z., Yang, H.: Bridgedrive: Diffusion bridge policy for closed-loop trajectory planning in autonomous driving. arXiv preprint arXiv:2509.23589 (2025) 10

work page arXiv 2025

[29] [29]

In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

Liu, X., Gong, C., et al.: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations, ICLR 2023 5

work page 2023

[30] [30]

In: Conference on Robot Learning

Liu, X., Zhong, Z., Zhang, Q., Guo, Y., Zheng, Y., Wang, J., Zhao, D., Liu, Y.F., Su, Z., Gao, Y., et al.: Reasonplan: Unified scene prediction and decision reasoning for closed-loop autonomous driving. In: Conference on Robot Learning. pp. 3051–

work page

[31] [31]

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Lu, J., Guan, J., Huang, Z., Li, J., Li, G., Kong, L., Li, Y., Wang, H., Xu, S., Luo, Y., et al.: Onevl: One-step latent reasoning and planning with vision-language explanation. arXiv preprint arXiv:2604.18486 (2026) 4

work page internal anchor Pith review Pith/arXiv arXiv 2026

[32] [32]

LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Nguyen, L., Fauth, M., Jaeger, B., Dauner, D., Igl, M., Geiger, A., Chitta, K.: Lead: Minimizing learner-expert asymmetry in end-to-end driving. arXiv preprint arXiv:2512.20563 (2025) 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7077–7087 (2021) 2, 4, 7, 10, 11, 12, 20

work page 2021

[34] [34]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Rao, B., Liao, H., Guan, Y., Wang, C., Wang, B., Zhang, J., Li, Z.: Amd: Adap- tive momentum and decoupled contrastive learning framework for robust long-tail trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28849–28858 (2025) 4

work page 2025

[35] [35]

In: 2025 International Joint Conference on Neural Networks (IJCNN)

Rosin, G., Rahman, M.R.U., Vascon, S.: Ecam: A contrastive learning approach to avoid environmental collision in trajectory forecasting. In: 2025 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2025) 4

work page 2025

[36] [36]

Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.arXiv preprint arXiv:2509.17940, 2025

Shang, S., Chen, Y., Wang, Y., Li, Y., Zhang, Z.: Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving. arXiv preprint arXiv:2509.17940 (2025) 4

work page arXiv 2025

[37] [37]

Denoising Diffusion Implicit Models

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020) 2

work page internal anchor Pith review Pith/arXiv arXiv 2010

[38] [38]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2446–2454 (2020) 11

work page 2020

[39] [39]

SimScale: Learning to Drive via Real-World Simulation at Scale

Tian, H., Li, T., Liu, H., Yang, J., Qiu, Y., Li, G., Wang, J., Gao, Y., Zhang, Z., Wang, L., et al.: Simscale: Learning to drive via real-world simulation at scale. arXiv preprint arXiv:2511.23369 (2025) 4

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

arXiv preprint arXiv:2602.20794 (2026) 10

Wang, J., Li, G., Huang, Z., Dang, C., Ye, H., Han, Y., Chen, L.: Vggdrive: Empow- ering vision-language models with cross-view geometric grounding for autonomous driving. arXiv preprint arXiv:2602.20794 (2026) 10

work page arXiv 2026

[41] [41]

arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

Wang, J., Liu, X., Zheng, Y., Xing, Z., Li, P., Li, G., Ma, K., Chen, G., Ye, H., Xia, Z., et al.: Meanfuser: Fast one-step multi-modal trajectory generation and adaptive reconstruction via meanflow for end-to-end autonomous driving. arXiv preprint arXiv:2602.20060 (2026) 2, 3, 4, 10, 11, 12, 20

work page arXiv 2026

[42] [42]

In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC)

Wang, J., Zhang, Q.: Consistencydrive: Efficient end-to-end autonomous driv- ing with consistency models. In: 2025 10th International Conference on Control, Robotics and Cybernetics (CRC). pp. 57–62. IEEE (2025) 4

work page 2025

[43] [43]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Weng, X., Ivanovic, B., Wang, Y., Wang, Y., Pavone, M.: Para-drive: Parallelized architecture for real-time autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15449–15458 (2024) 10

work page 2024

[44] [44]

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

Xia, T., Li, Y., Zhou, L., Yao, J., Xiong, K., Sun, H., Wang, B., Ma, K., Chen, G., Ye, H., et al.: Drivelaw: Unifying planning and video generation in a latent driving world. arXiv preprint arXiv:2512.23421 (2025) 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[45] [45]

In: 2021 IEEE international intelligent transportation systems conference (ITSC)

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., et al.: Pandaset: Advanced sensor suite dataset for autonomous driving. In: 2021 IEEE international intelligent transportation systems conference (ITSC). pp. 3095–3101. IEEE (2021) 11 18

work page 2021

[46] [46]

arXiv preprint arXiv:2512.19133 (2025) 10

Yang, P., Lu, B., Xia, Z., Han, C., Gao, Y., Zhang, T., Zhan, K., Lang, X., Zheng, Y., Zhang, Q.: Worldrft: Latent world model planning with reinforcement fine- tuning for autonomous driving. arXiv preprint arXiv:2512.19133 (2025) 10

work page arXiv 2025

[47] [47]

Drivesuprim: Towards precise trajectory selection for end-to-end planning.arXiv preprint arXiv:2506.06659, 2025

Yao, W., Li, Z., Lan, S., Wang, Z., Sun, X., Alvarez, J.M., Wu, Z.: Drivesuprim: Towards precise trajectory selection for end-to-end planning. arXiv preprint arXiv:2506.06659 (2025) 11

work page arXiv 2025

[48] [48]

In: 2024 IEEE In- telligent Vehicles Symposium (IV)

Zhang, J., Pourkeshavarz, M., Rasouli, A.: Tract: A training dynamics aware con- trastive learning framework for long-tail trajectory prediction. In: 2024 IEEE In- telligent Vehicles Symposium (IV). pp. 3282–3288. IEEE (2024) 4

work page 2024

[49] [49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Zhang, K., Tang, Z., Hu, X., Pan, X., Guo, X., Liu, Y., Huang, J., Yuan, L., Zhang, Q., Long, X.X., et al.: Epona: Autoregressive diffusion world model for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27220–27230 (2025) 10

work page 2025

[50] [50]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

Zhao, Z., Fu, T., Wang, Y., Wang, L., Lu, H.: From forecasting to planning: Policy world model for collaborative state-action prediction. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10

work page

[51] [51]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

Zhou, H., Lin, L., Wang, J., Lu, Y., Bai, D., Liu, B., Wang, Y., Geiger, A., Liao, Y.: Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 9

work page 2025

[52] [52]

Zhou, Z., Cai, T., Zhao, S.Z., Zhang, Y., Huang, Z., Zhou, B., Ma, J.: Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems 10 19 Supplementary Materials 8 Model Training and Sampling Algorithm 1:Flow ...

work page