PlanRL: A Trajectory Planning Architecture for Reinforcement Learning-based Driving Experts

Dongsuk Kum; Jangho Shin; Joonhee Lim; Yongjae Lee

arxiv: 2606.26858 · v1 · pith:C3NHED24new · submitted 2026-06-25 · 💻 cs.RO

PlanRL: A Trajectory Planning Architecture for Reinforcement Learning-based Driving Experts

Joonhee Lim , Yongjae Lee , Jangho Shin , Dongsuk Kum This is my paper

Pith reviewed 2026-06-26 04:41 UTC · model grok-4.3

classification 💻 cs.RO

keywords reinforcement learningtrajectory planningautonomous drivingFrenet frameCARLA simulatorkinematic constraintspolynomial plannerdriving policy

0 comments

The pith

RL driving experts improve by planning trajectories in Frenet coordinates with kinematic checks rather than outputting direct controls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a trajectory planning architecture that pairs an RL policy with a polynomial-based planner for autonomous driving tasks. It converts road geometries into a Frenet-frame curvilinear system to give the policy a structured coordinate prior that eases learning. A kinematic feasibility check is added during planning to keep outputs inside vehicle limits and cut cumulative tracking errors. Tests on CARLA Offline Leaderboard v1 and NoCrash benchmarks show the method raises driving scores by 5 percent and 11 percent and success rates by 8 percent and 19 percent over prior control-based RL experts. The goal is greater interpretability and better fit with modern planning pipelines.

Core claim

By employing a Frenet-frame coordinate system, our method simplifies complex road geometries into a curvilinear framework, offering a structured coordinate prior that facilitates policy learning. Furthermore, we incorporate a kinematic feasibility check into the planning stage to ensure that generated trajectories remain within the vehicle's physical limits, effectively mitigating cumulative tracking errors typically found in planning-based systems. We evaluate our approach on key CARLA benchmarks, where it significantly outperforms existing state-of-the-art control-based RL experts. On the CARLA Offline Leaderboard v1 and NoCrash benchmarks, our method improves the driving score by 5% and 1

What carries the argument

RL policy integrated with polynomial-based trajectory planner in Frenet-frame coordinates plus kinematic feasibility check.

If this is right

Road geometries become simpler for the RL policy to learn because they are expressed in a curvilinear Frenet frame.
Generated trajectories stay inside vehicle physical limits, reducing cumulative tracking errors.
The outputs are more interpretable than direct throttle and steering commands.
The architecture is more compatible with end-to-end planning systems than pure control-based RL.
Performance on CARLA benchmarks rises by the reported margins over prior control-based experts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Frenet-plus-kinematic structure could be tested in other continuous-control settings that involve curved paths and actuator limits.
Replacing the polynomial planner with learned trajectory generators might preserve the coordinate prior while increasing flexibility.
Real-vehicle deployment would require mapping sensor data into the Frenet frame without introducing new errors.
Combining the architecture with imitation learning pre-training could further lower sample needs on the CARLA tasks.

Load-bearing premise

The RL policy learns effectively from the simplified Frenet coordinate prior and the kinematic check sufficiently prevents cumulative tracking errors.

What would settle it

Running the architecture on the CARLA Offline Leaderboard v1 and NoCrash benchmarks and finding driving scores or success rates no higher than those of existing control-based RL experts.

Figures

Figures reproduced from arXiv: 2606.26858 by Dongsuk Kum, Jangho Shin, Joonhee Lim, Yongjae Lee.

**Figure 2.** Figure 2: Overall architecture of the proposed RL-based expert: PlanRL. The RL Policy Network Module outputs a high-level command using BEV segmentation images and an ego measurement vector as inputs. The Feasibility Check Module adjusts the terminal lateral state to satisfy kinematic constraints. The Trajectory Planning Module then generates a smooth trajectory based on the adjusted terminal state. dominantly adopt… view at source ↗

**Figure 3.** Figure 3: Visualization of driving scenarios in NCd-Town02 for (a) Roach and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Reinforcement learning (RL) has become a prominent framework for developing driving experts in autonomous vehicles. However, most existing RL-based experts are designed to output direct control commands (e.g., throttle, steering), which suffer from a lack of interpretability, high spatial complexity in learning road geometries, and poor compatibility with modern end-to-end planning architectures. To address these limitations, we propose a novel trajectory planning architecture for RL driving experts that integrates an RL policy with a polynomial-based trajectory planner. By employing a Frenet-frame coordinate system, our method simplifies complex road geometries into a curvilinear framework, offering a structured coordinate prior that facilitates policy learning. Furthermore, we incorporate a kinematic feasibility check into the planning stage to ensure that generated trajectories remain within the vehicle's physical limits, effectively mitigating cumulative tracking errors typically found in planning-based systems. We evaluate our approach on key CARLA benchmarks, where it significantly outperforms existing state-of-the-art control-based RL experts. On the CARLA Offline Leaderboard v1 and NoCrash benchmarks, our method improves the driving score by 5% and 11%, respectively, and increases the success rate by 8% and 19%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PlanRL wraps RL with a Frenet polynomial planner and reports modest CARLA gains, but the evaluation does not isolate what actually produces the lift.

read the letter

The core of this paper is an RL policy whose outputs feed a polynomial trajectory generator in Frenet coordinates, with an added kinematic feasibility filter before execution. They test the full pipeline on CARLA Offline Leaderboard v1 and NoCrash and claim 5 % and 11 % higher driving scores plus 8 % and 19 % higher success rates than prior control-based RL agents.

What is actually new is the explicit coupling of the learned policy to a classical planner that produces kinematically feasible trajectories instead of raw throttle/steering commands. The Frenet representation and the feasibility check are standard tools, but putting them together in this way for an RL driving expert is a reasonable engineering step that could improve compatibility with existing planning stacks.

The evaluation is the soft spot. The paper only shows the complete system against control-only baselines; there are no ablations that remove the Frenet transform, turn off the kinematic check, or replace the planner with a direct-control head while holding the policy and training fixed. Without those comparisons it is impossible to attribute the reported deltas to the two advertised mechanisms rather than to differences in action space, reward design, or hyper-parameters. The abstract gives the headline numbers but supplies no protocol details, variance estimates, or statistical tests, so the strength of the evidence remains unclear even after reading the full text.

This work is aimed at people already building hybrid learning-plus-planning systems for autonomous driving. A reader who needs another data point on CARLA might extract the architecture description and the raw scores, but anyone hoping to reuse the claimed improvements will have to re-run the experiments themselves.

I would send it for peer review. The idea is straightforward enough that referees can quickly judge whether the missing controls are fatal or fixable, and the subfield can use more documented attempts to move RL outputs into trajectory space.

Referee Report

2 major / 1 minor

Summary. The paper proposes PlanRL, a hybrid architecture that couples an RL policy to a polynomial trajectory planner operating in Frenet-frame coordinates and augmented by an explicit kinematic feasibility check. It claims that the Frenet prior simplifies road geometry for policy learning and that the feasibility check prevents cumulative tracking errors, yielding 5 % and 11 % higher driving scores together with 8 % and 19 % higher success rates versus prior control-based RL experts on the CARLA Offline Leaderboard v1 and NoCrash benchmarks.

Significance. If the performance deltas can be shown to arise specifically from the two advertised mechanisms rather than from differences in action space, reward design or hyper-parameters, the work would supply a concrete, interpretable bridge between end-to-end RL control and classical planning pipelines, which is a practically relevant direction for autonomous-driving research.

major comments (2)

[Evaluation] Evaluation section (as summarized in the abstract and described in the results): the manuscript reports aggregate improvements of the full PlanRL pipeline over control-based RL baselines but contains no ablation that removes the Frenet-frame transformation, disables the kinematic feasibility check, or replaces the planner with a direct-control head while holding the RL policy and training regime fixed. Consequently the central attribution—that the reported 5–11 % driving-score and 8–19 % success-rate gains are produced by the Frenet prior and feasibility check—remains unsupported.
[Abstract] Abstract and implied experimental protocol: no description is given of the precise CARLA versions, traffic densities, number of evaluation episodes, random seeds, or statistical tests used to establish the quoted percentage improvements, rendering it impossible to judge whether the numerical claims are reproducible or statistically reliable.

minor comments (1)

Clarify whether the polynomial planner is re-optimized at every time step or only when the RL policy issues a new reference; the current description leaves the closed-loop interaction ambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for strengthening the paper. We address each major comment below and will revise the manuscript accordingly to improve the support for our claims.

read point-by-point responses

Referee: [Evaluation] Evaluation section (as summarized in the abstract and described in the results): the manuscript reports aggregate improvements of the full PlanRL pipeline over control-based RL baselines but contains no ablation that removes the Frenet-frame transformation, disables the kinematic feasibility check, or replaces the planner with a direct-control head while holding the RL policy and training regime fixed. Consequently the central attribution—that the reported 5–11 % driving-score and 8–19 % success-rate gains are produced by the Frenet prior and feasibility check—remains unsupported.

Authors: We acknowledge that the current manuscript presents only aggregate results for the full PlanRL pipeline and does not include ablations that isolate the Frenet-frame transformation or the kinematic feasibility check while keeping the RL policy and training regime fixed. Such ablations would provide stronger evidence for the specific contributions of these components. We will add them in the revised version, including: (1) a variant using Cartesian coordinates instead of Frenet, (2) a variant disabling the kinematic feasibility check, and (3) a direct-control RL head with identical policy architecture and training, to directly attribute the reported gains. revision: yes
Referee: [Abstract] Abstract and implied experimental protocol: no description is given of the precise CARLA versions, traffic densities, number of evaluation episodes, random seeds, or statistical tests used to establish the quoted percentage improvements, rendering it impossible to judge whether the numerical claims are reproducible or statistically reliable.

Authors: We agree that the abstract and evaluation section omit key experimental details necessary for assessing reproducibility. In the revised manuscript we will expand the evaluation protocol description to specify the exact CARLA version, traffic densities, number of evaluation episodes, random seeds used, and any statistical tests applied to the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity; architecture proposal evaluated empirically

full rationale

The paper describes an RL architecture integrating a policy with a polynomial planner using Frenet coordinates and a kinematic check, then reports benchmark improvements on CARLA. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps are present in the provided text. The central claims rest on external benchmark comparisons rather than any reduction of outputs to inputs by construction. This is the common case of an empirical systems paper with no mathematical circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no free parameters, mathematical axioms, or new invented entities; the contribution is an architectural proposal using standard components from RL and planning literature.

pith-pipeline@v0.9.1-grok · 5743 in / 1240 out tokens · 50752 ms · 2026-06-26T04:41:36.873592+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 6 canonical work pages · 2 internal anchors

[1]

End-to- end urban driving by imitating a reinforcement learning coach,

Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to- end urban driving by imitating a reinforcement learning coach,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 222–15 232

2021
[2]

Model-free deep reinforcement learning for urban autonomous driving,

J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765– 2771

2019
[3]

Carl: Learning scalable planning policies with simple rewards,

B. Jaeger, D. Dauner, J. Beißwenger, S. Gerstenecker, K. Chitta, and A. Geiger, “Carl: Learning scalable planning policies with simple rewards,”arXiv preprint arXiv:2504.17838, 2025

work page arXiv 2025
[4]

Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving,

X. Jia, Y . Gao, L. Chen, J. Yan, P. L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963

2023
[5]

Think2drive: Efficient reinforce- ment learning by thinking with latent world model for autonomous driving (in carla-v2),

Q. Li, X. Jia, S. Wang, and J. Yan, “Think2drive: Efficient reinforce- ment learning by thinking with latent world model for autonomous driving (in carla-v2),” inEuropean conference on computer vision. Springer, 2024, pp. 142–158

2024
[6]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2),

Z. Yang, X. Jia, Q. Li, X. Yang, M. Yao, and J. Yan, “Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2),”arXiv preprint arXiv:2505.16394, 2025

work page arXiv 2025
[7]

Adawm: Adaptive world model based planning for autonomous driving,

H. Wang, X. Ye, F. Tao, C. Pan, A. Mallik, B. Yaman, L. Ren, and J. Zhang, “Adawm: Adaptive world model based planning for autonomous driving,”arXiv preprint arXiv:2501.13072, 2025

work page arXiv 2025
[8]

Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y . Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,”Advances in Neural Information Processing Systems, vol. 35, pp. 6119–6132, 2022

2022
[9]

Kinematic and dynamic vehicle models for autonomous driving control design,

J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and dynamic vehicle models for autonomous driving control design,” in 2015 IEEE intelligent vehicles symposium (IV). IEEE, 2015, pp. 1094–1099

2015
[10]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

2017
[11]

Learning terminal state of the trajectory planner: Application for collision scenarios of autonomous vehicles,

J. Lim, K. Lee, J. Shin, and D. Kum, “Learning terminal state of the trajectory planner: Application for collision scenarios of autonomous vehicles,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 7576–7582

2024
[12]

Distilldrive: End-to- end multi-mode autonomous driving distillation by isomorphic hetero- source planning model,

R. Yu, X. Zhang, R. Zhao, H. Yan, and M. Wang, “Distilldrive: End-to- end multi-mode autonomous driving distillation by isomorphic hetero- source planning model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 26 188–26 197

2025
[13]

arXiv preprint arXiv:2506.06659 (2025)

W. Yao, Z. Li, S. Lan, Z. Wang, X. Sun, J. M. Alvarez, and Z. Wu, “Drivesuprim: Towards precise trajectory selection for end-to-end planning,”arXiv preprint arXiv:2506.06659, 2025

work page arXiv 2025
[14]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

2020
[15]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,

D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavoneet al., “Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,”Ad- vances in Neural Information Processing Systems, vol. 37, pp. 28 706– 28 719, 2024

2024
[16]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

2023
[17]

Carla autonomous driving leaderboard,

CARLA Team, “Carla autonomous driving leaderboard,” https:// leaderboard.carla.org/, 2026, accessed: 2026-02-17

2026
[18]

Exploring the limitations of behavior cloning for autonomous driving,

F. Codevilla, E. Santana, A. M. L ´opez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” inProceed- ings of the IEEE/CVF international conference on computer vision, 2019, pp. 9329–9338

2019
[19]

Drivelm: Driving with graph visual question answering,

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “Drivelm: Driving with graph visual question answering,” inEuropean conference on computer vision. Springer, 2024, pp. 256–274

2024
[20]

Expert drivers for autonomous driving,

B. Jaeger, “Expert drivers for autonomous driving,”Master’s thesis, University of T ¨ubingen, vol. 1, no. 2, p. 3, 2021

2021
[21]

Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

2022
[22]

Hidden biases of end-to- end driving models,

B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to- end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

2023
[23]

Neat: Neural attention fields for end-to-end autonomous driving,

K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 793– 15 803

2021
[24]

Learning to drive from a world on rails,

D. Chen, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning to drive from a world on rails,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 590–15 599

2021
[25]

Learning by cheating,

D. Chen, B. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning by cheating,” inConference on robot learning. PMLR, 2020, pp. 66–75

2020
[26]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Standards for passenger comfort in automated vehicles: Acceleration and jerk,

K. N. de Winkel, T. Irmak, R. Happee, and B. Shyrokau, “Standards for passenger comfort in automated vehicles: Acceleration and jerk,” Applied Ergonomics, 2023

2023
[28]

Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution,

P.-W. Chou, D. Maturana, and S. Scherer, “Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution,” inProceedings of the 34th International Conference on Machine Learning, 2017, pp. 834–843

2017
[29]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

High- dimensional continuous control using generalized advantage estima- tion,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estima- tion,” inICLR, 2016

2016
[31]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 819–844, 2024

2024

[1] [1]

End-to- end urban driving by imitating a reinforcement learning coach,

Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to- end urban driving by imitating a reinforcement learning coach,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 222–15 232

2021

[2] [2]

Model-free deep reinforcement learning for urban autonomous driving,

J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765– 2771

2019

[3] [3]

Carl: Learning scalable planning policies with simple rewards,

B. Jaeger, D. Dauner, J. Beißwenger, S. Gerstenecker, K. Chitta, and A. Geiger, “Carl: Learning scalable planning policies with simple rewards,”arXiv preprint arXiv:2504.17838, 2025

work page arXiv 2025

[4] [4]

Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving,

X. Jia, Y . Gao, L. Chen, J. Yan, P. L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963

2023

[5] [5]

Think2drive: Efficient reinforce- ment learning by thinking with latent world model for autonomous driving (in carla-v2),

Q. Li, X. Jia, S. Wang, and J. Yan, “Think2drive: Efficient reinforce- ment learning by thinking with latent world model for autonomous driving (in carla-v2),” inEuropean conference on computer vision. Springer, 2024, pp. 142–158

2024

[6] [6]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2),

Z. Yang, X. Jia, Q. Li, X. Yang, M. Yao, and J. Yan, “Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2),”arXiv preprint arXiv:2505.16394, 2025

work page arXiv 2025

[7] [7]

Adawm: Adaptive world model based planning for autonomous driving,

H. Wang, X. Ye, F. Tao, C. Pan, A. Mallik, B. Yaman, L. Ren, and J. Zhang, “Adawm: Adaptive world model based planning for autonomous driving,”arXiv preprint arXiv:2501.13072, 2025

work page arXiv 2025

[8] [8]

Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y . Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,”Advances in Neural Information Processing Systems, vol. 35, pp. 6119–6132, 2022

2022

[9] [9]

Kinematic and dynamic vehicle models for autonomous driving control design,

J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and dynamic vehicle models for autonomous driving control design,” in 2015 IEEE intelligent vehicles symposium (IV). IEEE, 2015, pp. 1094–1099

2015

[10] [10]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

2017

[11] [11]

Learning terminal state of the trajectory planner: Application for collision scenarios of autonomous vehicles,

J. Lim, K. Lee, J. Shin, and D. Kum, “Learning terminal state of the trajectory planner: Application for collision scenarios of autonomous vehicles,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 7576–7582

2024

[12] [12]

Distilldrive: End-to- end multi-mode autonomous driving distillation by isomorphic hetero- source planning model,

R. Yu, X. Zhang, R. Zhao, H. Yan, and M. Wang, “Distilldrive: End-to- end multi-mode autonomous driving distillation by isomorphic hetero- source planning model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 26 188–26 197

2025

[13] [13]

arXiv preprint arXiv:2506.06659 (2025)

W. Yao, Z. Li, S. Lan, Z. Wang, X. Sun, J. M. Alvarez, and Z. Wu, “Drivesuprim: Towards precise trajectory selection for end-to-end planning,”arXiv preprint arXiv:2506.06659, 2025

work page arXiv 2025

[14] [14]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

2020

[15] [15]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,

D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavoneet al., “Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,”Ad- vances in Neural Information Processing Systems, vol. 37, pp. 28 706– 28 719, 2024

2024

[16] [16]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

2023

[17] [17]

Carla autonomous driving leaderboard,

CARLA Team, “Carla autonomous driving leaderboard,” https:// leaderboard.carla.org/, 2026, accessed: 2026-02-17

2026

[18] [18]

Exploring the limitations of behavior cloning for autonomous driving,

F. Codevilla, E. Santana, A. M. L ´opez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” inProceed- ings of the IEEE/CVF international conference on computer vision, 2019, pp. 9329–9338

2019

[19] [19]

Drivelm: Driving with graph visual question answering,

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “Drivelm: Driving with graph visual question answering,” inEuropean conference on computer vision. Springer, 2024, pp. 256–274

2024

[20] [20]

Expert drivers for autonomous driving,

B. Jaeger, “Expert drivers for autonomous driving,”Master’s thesis, University of T ¨ubingen, vol. 1, no. 2, p. 3, 2021

2021

[21] [21]

Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

2022

[22] [22]

Hidden biases of end-to- end driving models,

B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to- end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

2023

[23] [23]

Neat: Neural attention fields for end-to-end autonomous driving,

K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 793– 15 803

2021

[24] [24]

Learning to drive from a world on rails,

D. Chen, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning to drive from a world on rails,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 590–15 599

2021

[25] [25]

Learning by cheating,

D. Chen, B. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning by cheating,” inConference on robot learning. PMLR, 2020, pp. 66–75

2020

[26] [26]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Standards for passenger comfort in automated vehicles: Acceleration and jerk,

K. N. de Winkel, T. Irmak, R. Happee, and B. Shyrokau, “Standards for passenger comfort in automated vehicles: Acceleration and jerk,” Applied Ergonomics, 2023

2023

[28] [28]

Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution,

P.-W. Chou, D. Maturana, and S. Scherer, “Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution,” inProceedings of the 34th International Conference on Machine Learning, 2017, pp. 834–843

2017

[29] [29]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[30] [30]

High- dimensional continuous control using generalized advantage estima- tion,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estima- tion,” inICLR, 2016

2016

[31] [31]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 819–844, 2024

2024