Dynamics Distillation for Efficient and Transferable Control Learning

Igor Gilitschenski; Kashyap Chitta; Mahsa Golchoubian; Vladimir Suplin; Xunjiang Gu

arxiv: 2605.01516 · v1 · submitted 2026-05-02 · 💻 cs.RO

Dynamics Distillation for Efficient and Transferable Control Learning

Xunjiang Gu , Kashyap Chitta , Mahsa Golchoubian , Vladimir Suplin , Igor Gilitschenski This is my paper

Pith reviewed 2026-05-09 14:06 UTC · model grok-4.3

classification 💻 cs.RO

keywords dynamics distillationreinforcement learningsim2sim transferautonomous drivingvehicle simulationpolicy transferlearned dynamics model

0 comments

The pith

Distilling high-fidelity vehicle simulator dynamics into a learned parallel model allows reinforcement learning policies to be trained efficiently and transferred back reliably.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Sim2Sim2Sim to solve the tension between physical realism and computational scalability in training control policies for autonomous driving. It distills the dynamics of a high-fidelity simulator into a fast, highly parallelizable learned dynamics model. Policies are trained entirely inside this distilled environment and then deployed directly into the original simulator. Experiments show faster optimization and successful transfer even under challenging conditions. The work additionally demonstrates that a dynamics model's usefulness for reinforcement learning is better measured by the policies it produces than by its standalone prediction accuracy.

Core claim

By distilling the dynamics of a high-fidelity vehicle simulator into a highly parallelizable learned dynamics model, control policies can be trained purely within the distilled environment and then deployed back into the high-fidelity source simulator, yielding more efficient policy optimization and reliable transfer under challenging dynamics.

What carries the argument

The Sim2Sim2Sim distillation process that converts high-fidelity simulator rollouts into a learned dynamics model used as the sole training environment for reinforcement learning policies.

If this is right

Policy optimization becomes more efficient because the learned model supports high parallelism unavailable in the original simulator.
Policies trained exclusively in the distilled model achieve reliable transfer when executed in the high-fidelity simulator.
Suitability of a learned dynamics model for reinforcement learning training must be judged by the quality of policies it enables, not solely by its predictive accuracy on rollouts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation step could be applied to other high-fidelity simulators in robotics to accelerate policy search.
Iterative refinement of the distilled model using policy performance feedback might further close the gap to the source simulator.
If transfer remains stable, the approach opens a route to training on ensembles of distilled models that capture uncertainty in dynamics.

Load-bearing premise

A learned dynamics model trained to match simulator rollouts will produce policies whose performance transfers reliably back to the original high-fidelity simulator under challenging dynamics.

What would settle it

Train a policy to completion inside the distilled model and then measure a large performance drop when the same policy is deployed in the original high-fidelity simulator on the same tasks and dynamics.

Figures

Figures reproduced from arXiv: 2605.01516 by Igor Gilitschenski, Kashyap Chitta, Mahsa Golchoubian, Vladimir Suplin, Xunjiang Gu.

**Figure 1.** Figure 1: The Sim2Sim2Sim framework operates in three view at source ↗

**Figure 2.** Figure 2: Example driving scenarios from the WOMD Mini val view at source ↗

**Figure 3.** Figure 3: Evaluation tracks in BeamNG. Top: Putnam Park Road Course (2.765 km) under nominal asphalt conditions. Bottom: same track modified with seven ice patches (blue regions) creating friction transitions, with marked entry/exit zones. Ice patches test the policies’ robustness to sudden dynamics changes where policies must rapidly adapt their control strategy. for Robust Control Learning. Although the Transforme… view at source ↗

read the original abstract

Robust control policy learning for autonomous driving requires training environments to be both physically realistic and computationally scalable, properties that existing simulators provide only in isolation. We introduce Sim2Sim2Sim, a framework that bridges high-fidelity vehicle simulation and scalable reinforcement learning by distilling simulator dynamics into a highly parallelizable learned dynamics model. By training control policies purely within this distilled environment and deploying them back into the high-fidelity source simulator, we demonstrate more efficient policy optimization and reliable transfer under challenging dynamics. We further show that predictive accuracy alone does not fully characterize a learned dynamics model's suitability as a reinforcement learning training environment, which should also be assessed by the quality of the policies it enables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sim2Sim2Sim distillation gives a practical route to faster RL training for vehicle control by moving from high-fid sim to learned model and back, with the useful reminder that policy quality matters more than pure prediction error.

read the letter

The paper's main move is to distill dynamics from a high-fidelity vehicle simulator into a fast, parallelizable learned model, train policies entirely inside that model, and then deploy them back into the original simulator. They explicitly note that rollout prediction accuracy by itself does not guarantee a good training environment and that you have to judge the model by the policies it produces. That distinction is straightforward and worth making in the sim-to-real literature for driving tasks. The two-stage pipeline is presented cleanly and the framework name highlights the sim-to-sim transfer step without overclaiming theoretical guarantees. The work stays empirical and focused on practical efficiency and transfer reliability under challenging dynamics. The central claim holds up on its own terms because the authors do not pretend the distilled model is a perfect replacement; they tie success to downstream control performance. The main limitation is that the abstract gives no numbers, baselines, or task details, so it is impossible to tell how large the efficiency gains actually are or whether the transfer remains stable when dynamics vary. If the experiments compare against direct training in the high-fidelity simulator and against standard model-based RL, the results could be useful; without those comparisons the advance stays hard to size. This is for people already working on simulation-based RL for autonomous driving or robotics who need faster inner loops. A reader in that subfield would pick up the evaluation point and the pipeline structure. The paper deserves a serious referee because the idea is concrete, the argument about policy-based assessment is honest, and the empirical claims are falsifiable once the full experiments are examined.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Sim2Sim2Sim framework, which distills dynamics from a high-fidelity vehicle simulator into a learned, highly parallelizable dynamics model. Reinforcement learning policies are trained entirely within this distilled environment and then deployed back into the original high-fidelity simulator. The authors claim this yields more efficient policy optimization and reliable transfer under challenging dynamics for autonomous driving tasks. They further argue that a learned dynamics model's suitability as an RL training environment must be judged by the quality of the policies it produces, not solely by its predictive accuracy on simulator rollouts.

Significance. If the empirical claims are substantiated with detailed results, this work could meaningfully advance scalable control learning in robotics by enabling the use of physically realistic but computationally heavy simulators for large-scale RL without prohibitive costs. The explicit separation of predictive accuracy from downstream policy quality provides a useful evaluation lens for sim-to-sim transfer methods and could influence how future dynamics models are assessed in the field.

major comments (2)

Abstract: The central claims of 'more efficient policy optimization' and 'reliable transfer under challenging dynamics' are stated without any quantitative metrics, baselines, task descriptions, or result summaries, which are load-bearing for assessing whether the framework delivers on its promises.
Method/Experiments (inferred from framework description): The distillation procedure and its loss function are not specified, leaving open whether the learned model preserves the challenging dynamics or if transfer success could arise from simplifications that align with the policy reward in the same simulator, as noted in the stress-test concern.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the potential significance of the Sim2Sim2Sim framework. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: The central claims of 'more efficient policy optimization' and 'reliable transfer under challenging dynamics' are stated without any quantitative metrics, baselines, task descriptions, or result summaries, which are load-bearing for assessing whether the framework delivers on its promises.

Authors: We agree that the abstract would benefit from greater specificity to allow readers to immediately assess the strength of the claims. In the revised manuscript we will add concise quantitative indicators drawn from the experimental results, including the observed reduction in policy training wall-clock time, the sample-efficiency gains relative to direct training in the high-fidelity simulator, and the transfer success rates under the reported challenging dynamics. We will also name the primary baselines and the autonomous-driving tasks used. revision: yes
Referee: Method/Experiments (inferred from framework description): The distillation procedure and its loss function are not specified, leaving open whether the learned model preserves the challenging dynamics or if transfer success could arise from simplifications that align with the policy reward in the same simulator, as noted in the stress-test concern.

Authors: The referee correctly notes that the current description of the distillation procedure is insufficiently detailed. We will expand the Methods section to provide the exact loss function (a combination of multi-step state-transition prediction error, action-consistency regularization, and a dynamics-complexity penalty), the training data generation protocol, and the optimization hyperparameters. We will further add an explicit analysis and supporting experiments that compare rollout statistics on critical scenarios between the original and distilled models, demonstrating that the challenging dynamics are retained. To address the stress-test concern directly, we will include an ablation that trains policies on deliberately simplified dynamics and shows that such simplifications do not reproduce the transfer performance achieved by our distilled model. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical framework (Sim2Sim2Sim) for distilling high-fidelity simulator dynamics into a learned parallelizable model, training RL policies inside it, and transferring back to the source simulator. No derivation chain, equations, or load-bearing steps are presented that reduce by construction to fitted inputs, self-citations, or renamed known results. The central claims rest on policy performance demonstrations rather than theoretical reductions or uniqueness theorems. The observation that predictive accuracy alone is insufficient for judging RL suitability is an empirical point, not a circular argument. The framework is self-contained as an empirical demonstration without internal reductions to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. No explicit free parameters, axioms, or invented entities are stated; the learned dynamics model is presented as a trained artifact rather than a newly postulated physical entity.

pith-pipeline@v0.9.0 · 5418 in / 1233 out tokens · 23712 ms · 2026-05-09T14:06:16.565030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Baidu apollo em motion planner,

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv, 2018

work page 2018
[2]

PARA- Drive: Parallelized Architecture for Real-Time Autonomous Driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “PARA- Drive: Parallelized Architecture for Real-Time Autonomous Driving,” inCVPR, 2024

work page 2024
[3]

Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,

Y . Wang, W. Luo, J. Bai, Y . Cao, T. Che, K. Chen, Y . Chen, J. Di- amond, Y . Ding, W. Ding,et al., “Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,”arXiv, 2025

work page 2025
[4]

CommonRoad: Compos- able Benchmarks for Motion Planning on Roads,

M. Althoff, M. Koschi, and S. Manzinger, “CommonRoad: Compos- able Benchmarks for Motion Planning on Roads,” inIV, 2017

work page 2017
[5]

A Sequential Two- Step Algorithm for Fast Generation of Vehicle Racing Trajectories,

N. R. Kapania, J. Subosits, and J. C. Gerdes, “A Sequential Two- Step Algorithm for Fast Generation of Vehicle Racing Trajectories,” Journal of Dynamic Systems, Measurement, and Control, 2016

work page 2016
[6]

Minimum Maneuver Time Calculation Using Convex Optimization,

J. P. Timings and D. J. Cole, “Minimum Maneuver Time Calculation Using Convex Optimization,”Journal of Dynamic Systems, Measure- ment, and Control, 2013

work page 2013
[7]

Linear System Identification Versus Physical Modeling of Lateral–Longitudinal Ve- hicle Dynamics,

B. A. H. Vicente, S. S. James, and S. R. Anderson, “Linear System Identification Versus Physical Modeling of Lateral–Longitudinal Ve- hicle Dynamics,”IEEE Transactions on Control Systems Technology, 2021

work page 2021
[8]

Learning- Based Model Predictive Control for Autonomous Racing,

J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning- Based Model Predictive Control for Autonomous Racing,”RAL, 2019

work page 2019
[9]

A Physics-Informed Neural Network for the Prediction of Unmanned Surface Vehicle Dynamics,

P.-F. Xuet al., “A Physics-Informed Neural Network for the Prediction of Unmanned Surface Vehicle Dynamics,”Journal of Marine Science and Engineering, 2022

work page 2022
[10]

Deep Dynamics: Vehicle Dy- namics Modeling with a Physics-Constrained Neural Network for Autonomous Racing,

J. Chrosniak, J. Ning, and M. Behl, “Deep Dynamics: Vehicle Dy- namics Modeling with a Physics-Constrained Neural Network for Autonomous Racing,”RAL, 2024

work page 2024
[11]

Neural Network Vehicle Models for High- Performance Automated Driving,

N. A. Spielberget al., “Neural Network Vehicle Models for High- Performance Automated Driving,”Science Robotics, 2019

work page 2019
[12]

End-to-End Neural Network for Vehicle Dynamics Modeling,

L. Hermansdorfer, R. Trauth, J. Betz, and M. Lienkamp, “End-to-End Neural Network for Vehicle Dynamics Modeling,” inCiSt, 2020

work page 2020
[13]

Deep Learning Helicopter Dynamics Models,

A. Punjani and P. Abbeel, “Deep Learning Helicopter Dynamics Models,” inICRA, 2015

work page 2015
[14]

Scalable Deep Kernel Gaussian Process for Vehicle Dynamics in Autonomous Racing,

J. Ning and M. Behl, “Scalable Deep Kernel Gaussian Process for Vehicle Dynamics in Autonomous Racing,” inCoRL, 2023

work page 2023
[15]

Hybrid Physics and Deep Learning Model for Interpretable Vehicle State Prediction,

A. Baier, Z. Boukhers, and S. Staab, “Hybrid Physics and Deep Learning Model for Interpretable Vehicle State Prediction,”arXiv, 2021

work page 2021
[16]

CARLA: An Open Urban Driving Simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simulator,” inCoRL, 2017

work page 2017
[17]

Pseudo-Simulation for Autonomous Driving,

W. Cao, M. Hallgarten, T. Li, D. Dauner, X. Gu, C. Wang, Y . Miron, M. Aiello, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, A. Geiger, and K. Chitta, “Pseudo-Simulation for Autonomous Driving,” inCoRL, 2025

work page 2025
[18]

Robust Autonomy Emerges from Self-play,

M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Pe- trenko, E. Vinitsky, E. Wijmans, T. Killian, S. Bowers, O. Sener,et al., “Robust Autonomy Emerges from Self-play,”arXiv, 2025

work page 2025
[19]

Waymax: An Accelerated, Data- Driven Simulator for Large-Scale Autonomous Driving Research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chen, J. D. Co-Reyes, R. Agarwal, R. Roelofs, Y . Lu, N. Montali, P. Mougin, Z. Yang, B. White, A. Faust, R. McAl- lister, D. Anguelov, and B. Sapp, “Waymax: An Accelerated, Data- Driven Simulator for Large-Scale Autonomous Driving Research,” in NeurIPS, 2023

work page 2023
[20]

Metadrive: Composing Diverse Driving Scenarios for Generalizable Reinforce- ment Learning,

Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “Metadrive: Composing Diverse Driving Scenarios for Generalizable Reinforce- ment Learning,”PAMI, 2022

work page 2022
[21]

GPUdrive: Data-Driven, Multi-Agent Driving Simulation at 1 Million FPS,

S. Kazemkhani, A. Pandya, D. Cornelisse, B. Shacklett, and E. Vinit- sky, “GPUdrive: Data-Driven, Multi-Agent Driving Simulation at 1 Million FPS,”arXiv, 2024

work page 2024
[22]

An Extensible, Data- Oriented Architecture for High-Performance, Many-World Simula- tion,

B. Shacklett, L. G. Rosenzweig, Z. Xie, B. Sarkar, A. Szot, E. Wij- mans, V . Koltun, D. Batra, and K. Fatahalian, “An Extensible, Data- Oriented Architecture for High-Performance, Many-World Simula- tion,”ACM Trans. Graph., 2023

work page 2023
[23]

Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,” inICCV, 2021

work page 2021
[24]

Emma: End-to-end Multimodal Model for Autonomous Driving,

J.-J. Hwang, R. Xu, H. Lin, W.-C. Hung, J. Ji, K. Choi, D. Huang, T. He, P. Covington, B. Sapp,et al., “Emma: End-to-end Multimodal Model for Autonomous Driving,”arXiv, 2024

work page 2024
[25]

Data Scaling Laws for End-to-End Autonomous Driving,

A. Naumann, X. Gu, T. Dimlioglu, M. Bojarski, A. Degirmenci, A. Popov, D. Bisla, M. Pavone, U. Muller, and B. Ivanovic, “Data Scaling Laws for End-to-End Autonomous Driving,” inCVPRW, 2025

work page 2025
[26]

BeamNG.tech

BeamNG GmbH, “BeamNG.tech.”

work page
[27]

A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data,

A. Remonda, N. Hansen, A. Raji, N. Musiu, M. Bertogna, E. E. Veas, and X. Wang, “A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data,” inNeurIPS, 2024

work page 2024
[28]

Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning,

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs,et al., “Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning,”Nature, 2022

work page 2022
[29]

CaRL: Learning Scalable Planning Policies with Simple Rewards,

B. Jaeger, D. Dauner, J. Beißwenger, S. Gerstenecker, K. Chitta, and A. Geiger, “CaRL: Learning Scalable Planning Policies with Simple Rewards,” inCoRL, 2025

work page 2025
[30]

Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy,

B. Grooten, P. MacAlpine, K. Subramanian, P. Stone, and P. R. Wurman, “Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy,”arXiv, 2025

work page 2025
[31]

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics,

C. Li, A. Krause, and M. Hutter, “Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics,” arXiv, 2025

work page 2025
[32]

Design and Analysis of Traction Control Strategies for Icy Road Conditions,

M. Mihalkov, C. Caponio, Z. Hankovszki, A. Sorniotti, U. Montanaro, and P. Gruber, “Design and Analysis of Traction Control Strategies for Icy Road Conditions,” inAVEC, 2024

work page 2024
[33]

Adaptive Lane Change Trajectory Planning Scheme for Autonomous Vehicles Under Various Road Frictions and Vehicle Speeds,

J. Hu, Y . Zhang, and S. Rakheja, “Adaptive Lane Change Trajectory Planning Scheme for Autonomous Vehicles Under Various Road Frictions and Vehicle Speeds,”T-IV, 2023

work page 2023
[34]

An Integrated Framework for Autonomous Driving Planning and Tracking Based on NNMPC Considering Road Surface Variations,

Z. Gao, W. Wen, Y . Xing, and A. Tsourdos, “An Integrated Framework for Autonomous Driving Planning and Tracking Based on NNMPC Considering Road Surface Variations,”T-IV, 2025

work page 2025
[35]

High-speed Autonomous Drifting with Deep Reinforcement Learning,

P. Cai, X. Mei, L. Tai, Y . Sun, and M. Liu, “High-speed Autonomous Drifting with Deep Reinforcement Learning,”RAL, 2020

work page 2020
[36]

Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey,

Y . Chen, C. Ji, Y . Cai, T. Yan, and B. Su, “Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey,” arXiv, 2024

work page 2024
[37]

RAPTOR: A Foundation Policy for Quadrotor Control,

J. Eschmann, D. Albani, and G. Loianno, “RAPTOR: A Foundation Policy for Quadrotor Control,”arXiv, 2025

work page 2025
[38]

LocoFormer: Generalist Loco- motion via Long-Context Adaptation,

M. Liu, D. Pathak, and A. Agarwal, “LocoFormer: Generalist Loco- motion via Long-Context Adaptation,” inCoRL, 2025

work page 2025
[39]

Anycar to Anywhere: Learning Universal Dynamics Model for Agile and Adaptive Mobility,

W. Xiao, H. Xue, T. Tao, D. Kalaria, J. M. Dolan, and G. Shi, “Anycar to Anywhere: Learning Universal Dynamics Model for Agile and Adaptive Mobility,” inICRA, 2025

work page 2025
[40]

Residual Learning towards High-Fidelity Vehicle Dynamics Modeling with Transformer,

J. Miao, R. Yan, B. Zhang, T. Wen, J. Li, Z. Fu, K. Jiang, M. Yang, J. Huang, Z. Zhong,et al., “Residual Learning towards High-Fidelity Vehicle Dynamics Modeling with Transformer,”RAL, 2025

work page 2025
[41]

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction,

X. Gu, G. Song, I. Gilitschenski, M. Pavone, and B. Ivanovic, “Producing and Leveraging Online Map Uncertainty in Trajectory Prediction,” inCVPR, 2024

work page 2024
[42]

Wod-e2e: Waymo Open Dataset for End-to-end Driving in Challenging Long-tail Scenarios,

R. Xu, H. Lin, W. Jeon, H. Feng, Y . Zou, L. Sun, J. Gorman, K. Tolstaya, S. Tang, B. White,et al., “Wod-e2e: Waymo Open Dataset for End-to-end Driving in Challenging Long-tail Scenarios,”arXiv, 2025

work page 2025
[43]

Racecar-the Dataset for High- speed Autonomous Racing,

A. Kulkarni, J. Chrosniak, E. Ducote, F. Sauerbeck, A. Saba, U. Chiri- mar, J. Link, M. Behl, and M. Cellina, “Racecar-the Dataset for High- speed Autonomous Racing,” inIROS, 2023

work page 2023
[44]

Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” inICML, 2018

work page 2018
[45]

Proximal Policy Optimization Algorithms,

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,”arXiv, 2017

work page 2017
[46]

Recent Advanced Control Strategies for Autonomous Vehicles Use of MPC and RL,

B. Patel, R. D. Nirala, and S. Soni, “Recent Advanced Control Strategies for Autonomous Vehicles Use of MPC and RL,”IJEDR, 2025

work page 2025
[47]

A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning,

S. Ross, G. Gordon, and D. Bagnell, “A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning,” in AISTATS, 2011

work page 2011

[1] [1]

Baidu apollo em motion planner,

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv, 2018

work page 2018

[2] [2]

PARA- Drive: Parallelized Architecture for Real-Time Autonomous Driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “PARA- Drive: Parallelized Architecture for Real-Time Autonomous Driving,” inCVPR, 2024

work page 2024

[3] [3]

Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,

Y . Wang, W. Luo, J. Bai, Y . Cao, T. Che, K. Chen, Y . Chen, J. Di- amond, Y . Ding, W. Ding,et al., “Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,”arXiv, 2025

work page 2025

[4] [4]

CommonRoad: Compos- able Benchmarks for Motion Planning on Roads,

M. Althoff, M. Koschi, and S. Manzinger, “CommonRoad: Compos- able Benchmarks for Motion Planning on Roads,” inIV, 2017

work page 2017

[5] [5]

A Sequential Two- Step Algorithm for Fast Generation of Vehicle Racing Trajectories,

N. R. Kapania, J. Subosits, and J. C. Gerdes, “A Sequential Two- Step Algorithm for Fast Generation of Vehicle Racing Trajectories,” Journal of Dynamic Systems, Measurement, and Control, 2016

work page 2016

[6] [6]

Minimum Maneuver Time Calculation Using Convex Optimization,

J. P. Timings and D. J. Cole, “Minimum Maneuver Time Calculation Using Convex Optimization,”Journal of Dynamic Systems, Measure- ment, and Control, 2013

work page 2013

[7] [7]

Linear System Identification Versus Physical Modeling of Lateral–Longitudinal Ve- hicle Dynamics,

B. A. H. Vicente, S. S. James, and S. R. Anderson, “Linear System Identification Versus Physical Modeling of Lateral–Longitudinal Ve- hicle Dynamics,”IEEE Transactions on Control Systems Technology, 2021

work page 2021

[8] [8]

Learning- Based Model Predictive Control for Autonomous Racing,

J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning- Based Model Predictive Control for Autonomous Racing,”RAL, 2019

work page 2019

[9] [9]

A Physics-Informed Neural Network for the Prediction of Unmanned Surface Vehicle Dynamics,

P.-F. Xuet al., “A Physics-Informed Neural Network for the Prediction of Unmanned Surface Vehicle Dynamics,”Journal of Marine Science and Engineering, 2022

work page 2022

[10] [10]

Deep Dynamics: Vehicle Dy- namics Modeling with a Physics-Constrained Neural Network for Autonomous Racing,

J. Chrosniak, J. Ning, and M. Behl, “Deep Dynamics: Vehicle Dy- namics Modeling with a Physics-Constrained Neural Network for Autonomous Racing,”RAL, 2024

work page 2024

[11] [11]

Neural Network Vehicle Models for High- Performance Automated Driving,

N. A. Spielberget al., “Neural Network Vehicle Models for High- Performance Automated Driving,”Science Robotics, 2019

work page 2019

[12] [12]

End-to-End Neural Network for Vehicle Dynamics Modeling,

L. Hermansdorfer, R. Trauth, J. Betz, and M. Lienkamp, “End-to-End Neural Network for Vehicle Dynamics Modeling,” inCiSt, 2020

work page 2020

[13] [13]

Deep Learning Helicopter Dynamics Models,

A. Punjani and P. Abbeel, “Deep Learning Helicopter Dynamics Models,” inICRA, 2015

work page 2015

[14] [14]

Scalable Deep Kernel Gaussian Process for Vehicle Dynamics in Autonomous Racing,

J. Ning and M. Behl, “Scalable Deep Kernel Gaussian Process for Vehicle Dynamics in Autonomous Racing,” inCoRL, 2023

work page 2023

[15] [15]

Hybrid Physics and Deep Learning Model for Interpretable Vehicle State Prediction,

A. Baier, Z. Boukhers, and S. Staab, “Hybrid Physics and Deep Learning Model for Interpretable Vehicle State Prediction,”arXiv, 2021

work page 2021

[16] [16]

CARLA: An Open Urban Driving Simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simulator,” inCoRL, 2017

work page 2017

[17] [17]

Pseudo-Simulation for Autonomous Driving,

W. Cao, M. Hallgarten, T. Li, D. Dauner, X. Gu, C. Wang, Y . Miron, M. Aiello, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, A. Geiger, and K. Chitta, “Pseudo-Simulation for Autonomous Driving,” inCoRL, 2025

work page 2025

[18] [18]

Robust Autonomy Emerges from Self-play,

M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Pe- trenko, E. Vinitsky, E. Wijmans, T. Killian, S. Bowers, O. Sener,et al., “Robust Autonomy Emerges from Self-play,”arXiv, 2025

work page 2025

[19] [19]

Waymax: An Accelerated, Data- Driven Simulator for Large-Scale Autonomous Driving Research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chen, J. D. Co-Reyes, R. Agarwal, R. Roelofs, Y . Lu, N. Montali, P. Mougin, Z. Yang, B. White, A. Faust, R. McAl- lister, D. Anguelov, and B. Sapp, “Waymax: An Accelerated, Data- Driven Simulator for Large-Scale Autonomous Driving Research,” in NeurIPS, 2023

work page 2023

[20] [20]

Metadrive: Composing Diverse Driving Scenarios for Generalizable Reinforce- ment Learning,

Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “Metadrive: Composing Diverse Driving Scenarios for Generalizable Reinforce- ment Learning,”PAMI, 2022

work page 2022

[21] [21]

GPUdrive: Data-Driven, Multi-Agent Driving Simulation at 1 Million FPS,

S. Kazemkhani, A. Pandya, D. Cornelisse, B. Shacklett, and E. Vinit- sky, “GPUdrive: Data-Driven, Multi-Agent Driving Simulation at 1 Million FPS,”arXiv, 2024

work page 2024

[22] [22]

An Extensible, Data- Oriented Architecture for High-Performance, Many-World Simula- tion,

B. Shacklett, L. G. Rosenzweig, Z. Xie, B. Sarkar, A. Szot, E. Wij- mans, V . Koltun, D. Batra, and K. Fatahalian, “An Extensible, Data- Oriented Architecture for High-Performance, Many-World Simula- tion,”ACM Trans. Graph., 2023

work page 2023

[23] [23]

Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,” inICCV, 2021

work page 2021

[24] [24]

Emma: End-to-end Multimodal Model for Autonomous Driving,

J.-J. Hwang, R. Xu, H. Lin, W.-C. Hung, J. Ji, K. Choi, D. Huang, T. He, P. Covington, B. Sapp,et al., “Emma: End-to-end Multimodal Model for Autonomous Driving,”arXiv, 2024

work page 2024

[25] [25]

Data Scaling Laws for End-to-End Autonomous Driving,

A. Naumann, X. Gu, T. Dimlioglu, M. Bojarski, A. Degirmenci, A. Popov, D. Bisla, M. Pavone, U. Muller, and B. Ivanovic, “Data Scaling Laws for End-to-End Autonomous Driving,” inCVPRW, 2025

work page 2025

[26] [26]

BeamNG.tech

BeamNG GmbH, “BeamNG.tech.”

work page

[27] [27]

A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data,

A. Remonda, N. Hansen, A. Raji, N. Musiu, M. Bertogna, E. E. Veas, and X. Wang, “A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data,” inNeurIPS, 2024

work page 2024

[28] [28]

Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning,

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs,et al., “Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning,”Nature, 2022

work page 2022

[29] [29]

CaRL: Learning Scalable Planning Policies with Simple Rewards,

B. Jaeger, D. Dauner, J. Beißwenger, S. Gerstenecker, K. Chitta, and A. Geiger, “CaRL: Learning Scalable Planning Policies with Simple Rewards,” inCoRL, 2025

work page 2025

[30] [30]

Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy,

B. Grooten, P. MacAlpine, K. Subramanian, P. Stone, and P. R. Wurman, “Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy,”arXiv, 2025

work page 2025

[31] [31]

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics,

C. Li, A. Krause, and M. Hutter, “Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics,” arXiv, 2025

work page 2025

[32] [32]

Design and Analysis of Traction Control Strategies for Icy Road Conditions,

M. Mihalkov, C. Caponio, Z. Hankovszki, A. Sorniotti, U. Montanaro, and P. Gruber, “Design and Analysis of Traction Control Strategies for Icy Road Conditions,” inAVEC, 2024

work page 2024

[33] [33]

Adaptive Lane Change Trajectory Planning Scheme for Autonomous Vehicles Under Various Road Frictions and Vehicle Speeds,

J. Hu, Y . Zhang, and S. Rakheja, “Adaptive Lane Change Trajectory Planning Scheme for Autonomous Vehicles Under Various Road Frictions and Vehicle Speeds,”T-IV, 2023

work page 2023

[34] [34]

An Integrated Framework for Autonomous Driving Planning and Tracking Based on NNMPC Considering Road Surface Variations,

Z. Gao, W. Wen, Y . Xing, and A. Tsourdos, “An Integrated Framework for Autonomous Driving Planning and Tracking Based on NNMPC Considering Road Surface Variations,”T-IV, 2025

work page 2025

[35] [35]

High-speed Autonomous Drifting with Deep Reinforcement Learning,

P. Cai, X. Mei, L. Tai, Y . Sun, and M. Liu, “High-speed Autonomous Drifting with Deep Reinforcement Learning,”RAL, 2020

work page 2020

[36] [36]

Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey,

Y . Chen, C. Ji, Y . Cai, T. Yan, and B. Su, “Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey,” arXiv, 2024

work page 2024

[37] [37]

RAPTOR: A Foundation Policy for Quadrotor Control,

J. Eschmann, D. Albani, and G. Loianno, “RAPTOR: A Foundation Policy for Quadrotor Control,”arXiv, 2025

work page 2025

[38] [38]

LocoFormer: Generalist Loco- motion via Long-Context Adaptation,

M. Liu, D. Pathak, and A. Agarwal, “LocoFormer: Generalist Loco- motion via Long-Context Adaptation,” inCoRL, 2025

work page 2025

[39] [39]

Anycar to Anywhere: Learning Universal Dynamics Model for Agile and Adaptive Mobility,

W. Xiao, H. Xue, T. Tao, D. Kalaria, J. M. Dolan, and G. Shi, “Anycar to Anywhere: Learning Universal Dynamics Model for Agile and Adaptive Mobility,” inICRA, 2025

work page 2025

[40] [40]

Residual Learning towards High-Fidelity Vehicle Dynamics Modeling with Transformer,

J. Miao, R. Yan, B. Zhang, T. Wen, J. Li, Z. Fu, K. Jiang, M. Yang, J. Huang, Z. Zhong,et al., “Residual Learning towards High-Fidelity Vehicle Dynamics Modeling with Transformer,”RAL, 2025

work page 2025

[41] [41]

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction,

X. Gu, G. Song, I. Gilitschenski, M. Pavone, and B. Ivanovic, “Producing and Leveraging Online Map Uncertainty in Trajectory Prediction,” inCVPR, 2024

work page 2024

[42] [42]

Wod-e2e: Waymo Open Dataset for End-to-end Driving in Challenging Long-tail Scenarios,

R. Xu, H. Lin, W. Jeon, H. Feng, Y . Zou, L. Sun, J. Gorman, K. Tolstaya, S. Tang, B. White,et al., “Wod-e2e: Waymo Open Dataset for End-to-end Driving in Challenging Long-tail Scenarios,”arXiv, 2025

work page 2025

[43] [43]

Racecar-the Dataset for High- speed Autonomous Racing,

A. Kulkarni, J. Chrosniak, E. Ducote, F. Sauerbeck, A. Saba, U. Chiri- mar, J. Link, M. Behl, and M. Cellina, “Racecar-the Dataset for High- speed Autonomous Racing,” inIROS, 2023

work page 2023

[44] [44]

Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” inICML, 2018

work page 2018

[45] [45]

Proximal Policy Optimization Algorithms,

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,”arXiv, 2017

work page 2017

[46] [46]

Recent Advanced Control Strategies for Autonomous Vehicles Use of MPC and RL,

B. Patel, R. D. Nirala, and S. Soni, “Recent Advanced Control Strategies for Autonomous Vehicles Use of MPC and RL,”IJEDR, 2025

work page 2025

[47] [47]

A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning,

S. Ross, G. Gordon, and D. Bagnell, “A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning,” in AISTATS, 2011

work page 2011