arxiv: 2604.16702 · v1 · submitted 2026-04-17 · 💻 cs.RO

Recognition: unknown

Autonomous Vehicle Collision Avoidance With Racing Parameterized Deep Reinforcement Learning

Shathushan Sivashangaran , Vihaan Dutta , Apoorva Khairnar , Sepideh Gohari , Azim Eskandarian

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:45 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous vehiclescollision avoidancedeep reinforcement learningracing parameterizationintersection scenariosmodel predictive controlzero-shot transferkinodynamics

0 comments

The pith

Parameterizing deep reinforcement learning with race-car overtaking produces autonomous vehicle collision avoidance policies that outperform model predictive control at vehicle limits with far lower compute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that deep reinforcement learning policies for autonomous vehicle collision avoidance can be parameterized using racing overtaking behaviors in simulation without explicit geometric reference trajectories. A physics-informed reward encodes nonlinear vehicle kinodynamics while guarding against simulator exploits. Two evaluated policies, a default uni-directional variant and a reversed-heading one, both consistently beat a Model Predictive Control and Artificial Potential Function baseline across three intersection scenarios, with successful zero-shot transfer to proportionally scaled hardware, 31 times fewer floating-point operations, and 64 times lower inference latency. The reversed-heading policy further improves head-on collision evasion by 30 percent over the default and 50 percent over the baseline.

Core claim

Racing-parameterized DRL collision avoidance policies trained out-of-distribution with a simulator-aware reward achieve superior performance to MPC-APF in intersection avoidance tasks. This includes higher evasion rates, especially for the reversed heading variant in head-on scenarios, at substantially reduced computational cost and with direct hardware transferability.

What carries the argument

Racing parameterization of the DRL policy, which encodes overtaking dynamics to guide collision avoidance without trajectory mimicry, paired with physics-informed rewards for kinodynamic fidelity.

If this is right

The DRL policies require 31 times fewer floating-point operations and 64 times lower inference latency than the MPC-APF baseline.
The reversed-heading variant outperforms the default racing policy by 30 percent in head-to-head collisions and the baseline by 50 percent.
Both DRL policies evade 10 percent more collisions than numerical optimal control in side-collision scenarios.
Zero-shot transfer succeeds to proportionally scaled hardware without retraining across the tested scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The racing parameterization could extend to other dynamic multi-agent avoidance problems such as pedestrian or cyclist evasion at intersections.
Reversed-heading navigation may provide advantages in mixed human-driven and autonomous traffic by enabling counter-flow strategies.
The reduced computational footprint opens deployment on lower-power embedded hardware in production autonomous vehicles.

Load-bearing premise

The simulator and its physics-informed reward accurately capture real-world nonlinear vehicle kinodynamics without allowing policies to exploit simulation artifacts absent in physical environments.

What would settle it

A side-by-side deployment test on physical scaled vehicles in the three intersection scenarios where the DRL policies lose their evasion advantage over the MPC-APF baseline or show collision rates above simulation levels.

Figures

Figures reproduced from arXiv: 2604.16702 by Apoorva Khairnar, Azim Eskandarian, Sepideh Gohari, Shathushan Sivashangaran, Vihaan Dutta.

**Figure 1.** Figure 1: Schematic of the method. The DRL policy is trained in simulation to avoid oncoming race cars while minimizing [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Simulation training environment. (a) 15 obstacle [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Trajectories of the overtaking and obstacle cars in the reversed heading simulation training environment. The [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Trajectories of the ego car, in green, and obstacle car, in red, on scaled hardware. The obstacle car moves (a) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Road traffic accidents are a leading cause of fatalities worldwide. In the US, human error causes 94% of crashes, resulting in excess of 7,000 pedestrian fatalities and $500 billion in costs annually. Autonomous Vehicles (AVs) with emergency collision avoidance systems that operate at the limits of vehicle dynamics at a high frequency, a dual constraint of nonlinear kinodynamic accuracy and computational efficiency, further enhance safety benefits during adverse weather and cybersecurity breaches, and to evade dangerous human driving when AVs and human drivers share roads. This paper parameterizes a Deep Reinforcement Learning (DRL) collision avoidance policy Out-Of-Distribution (OOD) utilizing race car overtaking, without explicit geometric mimicry reference trajectory guidance, in simulation, with a physics-informed, simulator exploit-aware reward to encode nonlinear vehicle kinodynamics. Two policies are evaluated, a default uni-direction and a reversed heading variant that navigates in the opposite direction to other cars, which both consistently outperform a Model Predictive Control and Artificial Potential Function (MPC-APF) baseline, with zero-shot transfer to proportionally scaled hardware, across three intersection collision scenarios, at 31x fewer Floating Point Operations (FLOPS) and 64x lower inference latency. The reversed heading policy outperforms the default racing overtaking policy in head-to-head collisions by 30% and the baseline by 50%, and matches the former in side collisions, where both DRL policies evade 10% greater than numerical optimal control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's racing-parameterized DRL for AV collision avoidance adds a reversed-heading variant and efficiency numbers, but the abstract leaves the performance claims and sim-to-real transfer unverified.

read the letter

The core contribution is a DRL policy for intersection collision avoidance that borrows overtaking parameterization from racing, without copying reference trajectories, plus a reversed-heading version that drives against traffic. Both versions are said to beat an MPC-APF baseline across three scenarios, with the reversed one gaining 30% in head-on cases and 50% overall versus the baseline, while using 31 times fewer FLOPS and 64 times lower latency, and transferring zero-shot to scaled hardware.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a deep reinforcement learning (DRL) policy for autonomous vehicle collision avoidance, parameterized via racing overtaking maneuvers without explicit geometric reference trajectories. A physics-informed, simulator-exploit-aware reward is used to encode nonlinear vehicle kinodynamics during simulation training. Two policies (default uni-directional and reversed-heading) are evaluated against an MPC-APF baseline across three intersection collision scenarios, claiming consistent outperformance, 30-50% gains in specific cases, zero-shot transfer to proportionally scaled hardware, and substantial efficiency advantages (31x fewer FLOPS, 64x lower inference latency).

Significance. If the central claims hold after validation, the work could demonstrate a viable path toward computationally lightweight, high-frequency DRL controllers that operate near vehicle dynamic limits for emergency AV safety. The racing parameterization idea and reported hardware transfer with large efficiency gains would be notable for real-time robotics applications in mixed-traffic settings, provided the simulation accurately captures relevant nonlinear effects.

major comments (3)

[Abstract] Abstract: The central claim of outperformance and zero-shot hardware transfer rests on a 'physics-informed, simulator exploit-aware reward' that encodes nonlinear kinodynamics, yet no reward equations, term definitions, or weighting details are supplied. Without these, it is impossible to assess whether the reward truly captures tire forces, slip angles, and actuator limits or instead permits simulator-specific exploits.
[Evaluation] Evaluation section: The manuscript reports consistent outperformance and specific percentage gains (e.g., 30% in head-to-head, 10% in side collisions) but provides no training curves, statistical tests, variance measures, or ablation results on individual reward components or the racing parameterization. These omissions make it difficult to substantiate that the gains are robust rather than artifacts of the chosen scenarios or baselines.
[Hardware Experiments] Hardware transfer experiments: Zero-shot transfer to scaled hardware is asserted as a key result supporting real-world applicability, but no quantitative sim-to-real gap metrics, trajectory matching against real vehicle data, or validation of the physics model (e.g., against measured tire forces or slip) are reported. This directly affects the load-bearing claim that the learned policies generalize without degradation.

minor comments (2)

[Abstract] The abstract is a single dense paragraph containing multiple distinct claims; splitting it or using clearer sentence structure would improve readability.
Ensure consistent definition of acronyms on first use (e.g., DRL, OOD, MPC-APF) and verify that all baselines are fully described with implementation details.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify important areas for improving clarity, rigor, and reproducibility. We address each major comment below and will revise the manuscript to incorporate additional details and analyses where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of outperformance and zero-shot hardware transfer rests on a 'physics-informed, simulator exploit-aware reward' that encodes nonlinear kinodynamics, yet no reward equations, term definitions, or weighting details are supplied. Without these, it is impossible to assess whether the reward truly captures tire forces, slip angles, and actuator limits or instead permits simulator-specific exploits.

Authors: We agree that explicit reward equations, term definitions, and weighting coefficients are necessary for full assessment and reproducibility. The manuscript describes the reward in the methods section as physics-informed and simulator-exploit-aware, but we will add the complete mathematical formulation, including all terms for tire forces, slip angles, actuator limits, and their weights, in a dedicated subsection of the revised version. This will enable direct evaluation of how nonlinear kinodynamics are encoded without permitting exploits. revision: yes
Referee: [Evaluation] Evaluation section: The manuscript reports consistent outperformance and specific percentage gains (e.g., 30% in head-to-head, 10% in side collisions) but provides no training curves, statistical tests, variance measures, or ablation results on individual reward components or the racing parameterization. These omissions make it difficult to substantiate that the gains are robust rather than artifacts of the chosen scenarios or baselines.

Authors: We acknowledge that training curves, variance across seeds, statistical tests, and ablations would strengthen the evaluation. In the revised manuscript we will include learning curves from multiple independent training runs with mean and standard deviation, results of statistical significance tests on the reported performance gains, and ablation studies isolating the contributions of the racing parameterization and individual reward terms. These additions will demonstrate that the outperformance is robust across the three scenarios. revision: yes
Referee: [Hardware Experiments] Hardware transfer experiments: Zero-shot transfer to scaled hardware is asserted as a key result supporting real-world applicability, but no quantitative sim-to-real gap metrics, trajectory matching against real vehicle data, or validation of the physics model (e.g., against measured tire forces or slip) are reported. This directly affects the load-bearing claim that the learned policies generalize without degradation.

Authors: We agree that quantitative sim-to-real metrics are important for validating the zero-shot transfer claim. The current manuscript reports successful zero-shot deployment on proportionally scaled hardware with no collisions, but we will augment the hardware section with available quantitative metrics such as average trajectory deviation between simulation and hardware runs, latency and FLOPS measurements, and any physics-model validation data that can be extracted from the existing experiments. Where direct tire-force or slip measurements are unavailable from the current dataset, we will note this limitation and discuss it explicitly. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper trains DRL policies using a custom racing-parameterized setup and physics-informed reward, then evaluates outperformance against an external MPC-APF baseline plus zero-shot hardware transfer. No load-bearing step reduces a claimed result to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no uniqueness theorem or ansatz is imported via self-citation. The reward is described as simulator-exploit-aware without equations showing it tautologically encodes the target performance. Evaluation relies on independent baselines rather than internal fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the central claim rests on an unstated simulator model and reward function whose details are absent.

pith-pipeline@v0.9.0 · 5583 in / 1031 out tokens · 40381 ms · 2026-05-10T07:45:49.385380+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Ames, A.D., Xu, X., Grizzle, J.W., and Tabuada, P. (2016). Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8), 3861--3876

2016
[2]

Ammour, M., Orjuela, R., and Basset, M. (2022). A mpc combined decision making and trajectory planning for autonomous vehicle collision avoidance. IEEE Transactions on Intelligent Transportation Systems, 23(12), 24805--24817

2022
[3]

Breitling, A., Kupfer, T., Gabriel, F., and Eckert, C. (2021). Security and privacy issues for connected vehicles. 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), 1--6

2021
[4]

Brunnbauer, A., Berducci, L., Brandst \'a tter, A., Lechner, M., Hasani, R., Rus, D., and Grosu, R. (2022). Latent imagination facilitates zero-shot transfer in autonomous racing. In 2022 International Conference on Robotics and Automation (ICRA), 7513--7520. IEEE

2022
[5]

Cao, Y., Xiao, C., Cyr, B., Zhou, Y., Park, W., Rampazzi, S., Chen, Q.A., Fu, K., and Mao, Z.M. (2019). Adversarial sensor attack on lidar-based perception in autonomous driving. 2019 ACM SIGSAC Conference on Computer and Communications Security, 2267--2281

2019
[6]

and Bai, Y

Coumans, E. and Bai, Y. (2016--2021). Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org

2016
[7]

Evans, B.D., Jordaan, H.W., and Engelbrecht, H.A. (2023). Comparing deep reinforcement learning architectures for autonomous racing. Machine Learning with Applications, 14, 100496

2023
[8]

Everett, M., Chen, Y.F., and How, J.P. (2021). Collision avoidance in pedestrian-rich environments with deep reinforcement learning. Ieee Access, 9, 10357--10377

2021
[9]

Farrah, J. (2026). Hit the road, mac: The future of self-driving cars. Testimony before the U.S. Senate Committee on Commerce, Science, & Transportation. ://www.commerce.senate.gov/services/files/4B417566-2E6B-4460-B38E-1D745F2146C7

2026
[10]

Feng, S., Qian, Y., and Wang, Y. (2021). Collision avoidance method of autonomous vehicle based on improved artificial potential field algorithm. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 235(14), 3416--3430

2021
[11]

Funke, J., Brown, M., Erlien, S., and Gerdes, J. (2016). Collision avoidance and stabilization for autonomous vehicles in emergency scenarios. IEEE Transactions on Control Systems Technology, 25(4), 1204--1216

2016
[12]

Guo, J., Hu, P., and Wang, R. (2016). Nonlinear coordinated steering and braking control of vision-based autonomous vehicles in emergency obstacle avoidance. IEEE Transactions on Intelligent Transportation Systems, 17(11), 3230--3240

2016
[13]

Kusano, K.D., Beatty, K., Schnelle, S., Favaro, F., Crary, C., and Victor, T. (2022). Collision avoidance testing of the waymo automated driving system. arXiv preprint arXiv:2212.08148

work page arXiv 2022
[14]

and Kum, D

Lee, K. and Kum, D. (2019). Collision avoidance/mitigation system: Motion planning of autonomous vehicle via predictive occupancy map. IEEE Access, 7, 52846--52857

2019
[15]

Li, G., Yang, Y., Zhang, T., Qu, X., Cao, D., Cheng, B., and Li, K. (2021). Risk assessment based collision avoidance decision-making for autonomous vehicles in multi-scenarios. Transportation research part C: emerging technologies, 122, 102820

2021
[16]

Liu, J., Jayakumar, P., Stein, J.L., and Ersal, T. (2017). Combined speed and steering control in high-speed autonomous ground vehicles for obstacle avoidance using model predictive control. IEEE Transactions on Vehicular Technology, 66(10), 8746--8763

2017
[17]

Lou, X., Yang, Y., Huo, X., and Wang, N. (2019). A survey of security threats and countermeasures in autonomous vehicles. 2019 IEEE 2nd International Conference on Electronics Technology (ICET), 423--428

2019
[18]

and Shladover, S.E

Petit, J. and Shladover, S.E. (2015). Potential cyberattacks on automated vehicles. IEEE Transactions on Intelligent Transportation Systems, 16(2), 546--556

2015
[19]

Rajabli, N., Flammini, F., Noto, M., and Jafari, R. (2020). Security challenges of connected and automated vehicles. 2020 IEEE International Conference on Smart Computing (SMARTCOMP), 73--80

2020
[20]

and Shick, J

Reichardt, D. and Shick, J. (1994). Collision avoidance in dynamic environments applied to autonomous vehicle guidance on the motorway. In Proceedings of the Intelligent Vehicles' 94 Symposium, 74--78. IEEE

1994
[21]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

and Eskandarian, A

Shang, X. and Eskandarian, A. (2023). Emergency collision avoidance and mitigation using model predictive control and artificial potential function. IEEE Transactions on Intelligent Vehicles, 8(5), 3458--3472

2023
[23]

and Eskandarian , A

Sivashangaran, S. and Eskandarian , A. (2023). Xtenth-car: A proportionally scaled experimental vehicle platform for connected autonomy and all-terrain research. In ASME International Mechanical Engineering Congress and Exposition, volume 87639, V006T07A068. American Society of Mechanical Engineers

2023
[24]

Sivashangaran, S., Khairnar, A., and Eskandarian, A. (2023). Autovrl: A high fidelity autonomous ground vehicle simulator for sim-to-real deep reinforcement learning. IFAC-PapersOnLine, 56(3), 475--480

2023
[25]

Sivashangaran, S., Khairnar, A., Gohari, S., Dutta, V., and Eskandarian, A. (2026). Physics-informed reinforcement learning of spatial density velocity potentials for map-free racing. arXiv preprint arXiv:2604.09499

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

Sun, J., Cao, Y., Chen, Q.A., and Mao, Z.M. (2020). Towards robust lidar-based perception in autonomous driving: General black-box adversarial sensor attack with principled defenses. 2020 ACM SIGSAC Conference on Computer and Communications Security, 877--894

2020
[27]

Thompson, M., Dallas, J., Goh, J.Y., and Balachandran, A. (2024). Adaptive nonlinear model predictive control: Maximizing tire force and obstacle avoidance in autonomous vehicles. IEEE Transactions on Field Robotics, 1, 318--331

2024
[28]

Yuan, Y., Tasik, R., Adhatarao, S.S., Yuan, Y., Liu, Z., and Fu, X. (2020). Race: Reinforced cooperative autonomous vehicle collision avoidance. IEEE transactions on vehicular technology, 69(9), 9279--9291

2020
[29]

Zhang, Y., Carballo, A., Yang, H., and Takeda, K. (2023). Perception and sensing for autonomous vehicles under adverse weather conditions: A survey. ISPRS Journal of Photogrammetry and Remote Sensing, 196, 146--177

2023
[30]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...
[31]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...