pith. sign in

arxiv: 2604.14986 · v1 · submitted 2026-04-16 · 💻 cs.RO

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

Pith reviewed 2026-05-10 10:36 UTC · model grok-4.3

classification 💻 cs.RO
keywords trajectory optimizationvisually impaired navigationdeep reinforcement learningheuristic samplingmomentum constraintsassistive roboticspath planningmulti-objective optimization
0
0 comments X

The pith

A hybrid heuristic framework with momentum constraints and residual DRL halves convergence iterations for safer, smoother navigation paths in visually impaired scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework to solve multi-objective path planning problems that current methods handle poorly for visually impaired users. It samples candidate trajectories heuristically, then applies momentum constraints to limit sudden velocity and acceleration shifts. A residual-enhanced reinforcement learning step refines those candidates to improve temporal consistency and generalization across scenes. Dual-stage cost modeling first enforces consistency in Frenet coordinates and then adapts weights in Cartesian space to reflect user preferences. If the approach holds, assistive devices could deliver more efficient and interpretable routes that maintain comfort and reduce collision risk in dynamic settings.

Core claim

The MHHTOF framework combines a Heuristic Trajectory Sampling Cluster with Momentum-Constrained Trajectory Optimization to suppress abrupt changes, refines outputs via residual-enhanced DRL for better policy generalization, and uses dual-stage cost modeling with Frenet-space consistency costs plus Cartesian-space adaptive weights driven by rewards. Experiments demonstrate convergence in nearly half the iterations of baselines, lower and more stable costs, and velocity and acceleration profiles that remain stable with reduced risk in complex dynamic scenarios.

What carries the argument

The Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF) that integrates heuristic trajectory sampling, momentum-constrained optimization, residual DRL refinement, and dual-stage cost modeling in Frenet and Cartesian spaces.

If this is right

  • The method reaches stable solutions in roughly half the iterations required by existing planners.
  • Optimization costs remain lower and exhibit less variation across runs.
  • Velocity and acceleration traces stay smooth even inside crowded, changing environments.
  • The dual cost stages allow explicit incorporation of individual comfort and safety priorities without losing interpretability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The residual DRL component could support transfer to new user groups or sensor suites with minimal retraining.
  • Frenet-space consistency combined with Cartesian reward weighting might generalize to other mobile-robot domains that must respect both local smoothness and global user intent.
  • Stable convergence behavior suggests the framework could serve as a reliable inner loop inside larger lifelong-learning pipelines for assistive devices.

Load-bearing premise

That the combination of heuristic sampling, momentum limits, residual DRL, and dual cost modeling will continue to work when real-world obstacles, user preferences, and environments differ from the tested cases.

What would settle it

A field deployment with unpredictable moving obstacles and diverse user preference inputs in which the framework requires more iterations or produces higher peak velocities and accelerations than baseline planners.

Figures

Figures reproduced from arXiv: 2604.14986 by Bo Gong, JiaLing Xiao, Jingya Wang, Liyong Ren, Manping Fan, Yongbin Yu, You Zhou, Yuting Zeng, Zhiwen Zheng.

Figure 1
Figure 1. Figure 1: Schematic diagram of the conversion trajectory from the Frenet coordinate to the Cartesian Coordinate. frame enhances trajectory quality. A weight transfer mechanism aligns both stages, enabling interpretable and personalized optimization for visually impaired users. This paper is organized as follows: Section 2 introduces the theoretical foundation and modeling assumptions; Section 3 describes the overall… view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of MHHTOF. Remark 1. Prior works [29, 28, 30] restricted to Frenet-frame optimization provide efficient feasibility but fail to capture semantic and social adaptability, whereas Cartesian-only learning methods [31] encode rich context but often sacrifice geometric consistency. The proposed dual-stage DCMM reconciles these limitations: the first stage secures feasible and smooth motion … view at source ↗
Figure 3
Figure 3. Figure 3: Flowchart of HTSCMOE. 𝑠(𝑡) denotes longitudinal evolution over time, while 𝑑(𝑠) captures lateral deviation with respect to the longitudinal displacement. This representation maintains coupling expressivity while enabling efficient computation. Given the boundary state at both the start and end of each segment, the coefficients 𝑎𝑖 and 𝑏𝑖 are computed analytically: For longitudinal motion: 𝑎[0∶2] = [ 𝑠 ( 𝑡0 … view at source ↗
Figure 4
Figure 4. Figure 4: Schematic of MTO process for visually impaired scenario. In this formulation, 𝐸motion,𝑖 represents the kinetic energy of agent, while 𝐸guidance,𝑖 reflects potential energy induced by environmental constraints such as road signs, speed limits, and the behaviors of surrounding agents in assistive navigation scenarios. This formulation allows agent dynamics to be expressed in terms of energy trade-offs and pr… view at source ↗
Figure 5
Figure 5. Figure 5: Proposed Residual-enhanced DRL framework with temporal modeling based on PPO [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Residual-enhanced temporal architecture for Actor Network and Critic Network, Where the policy and reward networks correspond to the actor and critic components, respectively [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Line graph comparing the change of mean reward with the number of training steps for different algorithm training processes, where (a) shows the training performance of all algorithms, (b) shows the performance of algorithms that converged after training, and (c) shows the performance of algorithms that did not converge after training [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Line graph comparing the change of average ep length with the number of training steps for different algorithm training processes, where (a) shows the training performance of all algorithms, (b) shows the performance of algorithms that converged after training, and (c) shows the performance of algorithms that did not converge after training. 4.3. Episode Length Evaluation Episode length reflects the effici… view at source ↗
Figure 9
Figure 9. Figure 9: Box plots of evaluation metrics for the best models of the two algorithms, where (a) shows cost metrics including mean and variance, and (b) shows risk metrics including mean and variance. The LSTM-ResB-PPO consistently outperforms other variants by balancing rapid convergence, stable FP and CP, and reliable episode length behavior, while avoiding the instability or collapse observed in alternative encoder… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization results of trajectory planning in two CommonRoad scenarios, where (a) DEU_Lengede-21_1_T-15 shows the algorithmic simulation outcomes and (b) ZAM_Junction-1_119_T-1 illustrates comparative trajectory planning results. topology, completing the task without violation, thereby evidencing its superior understanding of interaction and environmental constraints. The velocity and acceleration profi… view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of two algorithms under the DEU_Lengede-21_1_T-15 scenario, where (a) speed curves along the S-direction position, (b) acceleration curves along the S-direction position, and (c) line comparison of cumulative weighted cost are illustrated. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Variation of cumulative weighted cost and cumulative predicted action with s-direction position for two algorithms under the DEU_Lengede-21_1_T-15 Scenario: (a) line variation of cumulative predicted action for LSTM￾ResB-PPO; (b) line variation of cumulative predicted action for the baseline algorithm. Overall, this scenario shows that residual-enhanced temporal modeling enables the proposed framework to … view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of two algorithms under the ZAM_Junction-1_119_T-1 scenario, where (a) speed curves along the S-direction position, (b) acceleration curves along the S-direction position, and (c) line comparison of cumulative weighted cost are illustrated. naturally divides into three interpretable phases: left-turn interaction, deceleration, and stabilization. Each phase is characterized by more continuous an… view at source ↗
Figure 14
Figure 14. Figure 14: Variation of cumulative weighted cost and cumulative predicted action with s-direction position for two algorithms under the ZAM_Tjunction-1_119_T-1 scenario: (a) line variation of cumulative predicted action for LSTM￾ResB-PPO; (b) line variation of cumulative predicted action for the baseline algorithm [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
read the original abstract

Safe and efficient assistive planning for visually impaired scenarios remains challenging, since existing methods struggle with multi-objective optimization, generalization, and interpretability. In response, this paper proposes a Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF). To balance multiple objectives of comfort and safety, the framework designs a Heuristic Trajectory Sampling Cluster (HTSC) with a Momentum-Constrained Trajectory Optimization (MTO), which suppresses abrupt velocity and acceleration changes. In addition, a novel residual-enhanced deep reinforcement learning (DRL) module refines candidate trajectories, advancing temporal modeling and policy generalization. Finally, a dual-stage cost modeling mechanism (DCMM) is introduced to regulate optimization, where costs in the Frenet space ensure consistency, and reward-driven adaptive weights in the Cartesian space integrate user preferences for interpretability and user-centric decision-making. Experimental results show that the proposed framework converges in nearly half the iterations of baselines and achieves lower and more stable costs. In complex dynamic scenarios, MHHTOF further demonstrates stable velocity and acceleration curves with reduced risk, confirming its advantages in robustness, safety, and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF) for assistive trajectory planning in visually impaired scenarios. It integrates a Heuristic Trajectory Sampling Cluster (HTSC) with Momentum-Constrained Trajectory Optimization (MTO) to limit abrupt velocity/acceleration changes, a residual-enhanced DRL module for candidate refinement and temporal generalization, and a dual-stage cost modeling mechanism (DCMM) that enforces consistency in Frenet space while using reward-driven adaptive weights in Cartesian space to incorporate user preferences. The central experimental claim is that MHHTOF converges in nearly half the iterations of baselines, yields lower and more stable costs, and produces stable velocity/acceleration profiles with reduced risk in complex dynamic scenarios.

Significance. If the performance claims are substantiated with proper controls, the hybrid heuristic-DRL approach could advance safe, interpretable navigation for visually impaired users by addressing multi-objective trade-offs between comfort, safety, and personalization. The momentum constraints and dual-space cost modeling offer a concrete mechanism for smoothness and user-centric adaptation that existing pure optimization or pure learning methods often lack.

major comments (2)
  1. [Experimental Results] Experimental Results section: The assertion that MHHTOF 'converges in nearly half the iterations of baselines' and achieves 'lower and more stable costs' with 'reduced risk' supplies no information on baseline algorithms, scenario generation (obstacle density, predictability, user-preference sampling), evaluation metrics, number of independent runs, or statistical tests. Without these, the reported advantages cannot be isolated from implementation artifacts or narrow test conditions and therefore do not support the central claim of superiority.
  2. [Method] Method section (DCMM and residual DRL): The reward-driven adaptive weights are described only at the level of 'integrate user preferences'; no explicit formulation, update rule, or ablation isolating their contribution is provided. This leaves open the possibility that the reported stability and risk reduction are artifacts of the particular training distribution rather than a general property of the framework.
minor comments (1)
  1. Several acronyms (MHHTOF, HTSC, MTO, DCMM) are introduced in the abstract and early text without immediate expansion on first use in the main body.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough review and constructive criticism of our work on the Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF). The comments highlight important aspects that require clarification and expansion to strengthen the paper's contributions. We address each major comment in detail below, outlining the specific revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results section: The assertion that MHHTOF 'converges in nearly half the iterations of baselines' and achieves 'lower and more stable costs' with 'reduced risk' supplies no information on baseline algorithms, scenario generation (obstacle density, predictability, user-preference sampling), evaluation metrics, number of independent runs, or statistical tests. Without these, the reported advantages cannot be isolated from implementation artifacts or narrow test conditions and therefore do not support the central claim of superiority.

    Authors: We agree that additional details are necessary to fully support the experimental claims. In the revised version of the manuscript, we will enhance the Experimental Results section by providing comprehensive information on the baseline algorithms, including their names, key parameters, and implementation specifics. We will also describe the scenario generation process in detail, covering aspects such as obstacle density, predictability, and user-preference sampling. Furthermore, we will specify the evaluation metrics, report the number of independent runs conducted, and present results from statistical tests to validate the observed improvements. These additions will help isolate the contributions of MHHTOF and address concerns about potential artifacts. revision: yes

  2. Referee: [Method] Method section (DCMM and residual DRL): The reward-driven adaptive weights are described only at the level of 'integrate user preferences'; no explicit formulation, update rule, or ablation isolating their contribution is provided. This leaves open the possibility that the reported stability and risk reduction are artifacts of the particular training distribution rather than a general property of the framework.

    Authors: We recognize the need for more explicit details on the reward-driven adaptive weights within the DCMM. In the revision, we will include the mathematical formulation of these weights, the specific update rule employed, and the reward function used for adaptation. Additionally, we will conduct and report an ablation study that compares the full framework against variants without the adaptive weights or with fixed weights. This study will be performed across different training distributions to demonstrate that the benefits in stability and risk reduction are not limited to the original training conditions but represent a general property of the approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper presents an engineering framework (MHHTOF) combining heuristic sampling (HTSC), momentum-constrained optimization (MTO), residual DRL refinement, and dual-stage cost modeling (DCMM with Frenet/Cartesian spaces and reward-driven weights). Performance claims rest on experimental comparisons in simulation rather than a closed mathematical derivation. No equations or steps are quoted that reduce a 'prediction' or result to fitted inputs by construction, nor is there load-bearing self-citation for uniqueness theorems. The reward-driven weights are an explicit design element for user preferences, not shown to be tautological. The derivation chain from components to reported convergence/safety advantages is self-contained and externally falsifiable via the described experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 5 invented entities

Only the abstract is available, so the ledger is limited to explicitly named components introduced by the authors. No numerical free parameters, mathematical axioms, or independently evidenced entities can be extracted.

invented entities (5)
  • MHHTOF no independent evidence
    purpose: Overall proposed framework for momentum-constrained hybrid heuristic trajectory optimization
    New acronym and system name defined in the abstract
  • HTSC no independent evidence
    purpose: Heuristic Trajectory Sampling Cluster for generating candidate paths
    Component introduced to balance comfort and safety
  • MTO no independent evidence
    purpose: Momentum-Constrained Trajectory Optimization to suppress abrupt velocity and acceleration changes
    Optimization submodule within HTSC
  • residual-enhanced DRL no independent evidence
    purpose: Refines candidate trajectories and improves temporal modeling and policy generalization
    Novel DRL variant described in the abstract
  • DCMM no independent evidence
    purpose: Dual-stage cost modeling mechanism using Frenet and Cartesian spaces with adaptive weights
    Mechanism for regulating optimization and incorporating user preferences

pith-pipeline@v0.9.0 · 5523 in / 1495 out tokens · 38541 ms · 2026-05-10T10:36:43.031257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

  1. [1]

    A. P. Kalidas, C. J. Joshua, A. Q. Md, S. Basheer, S. Mohan, S. Sakri, Deep reinforcement learning for vision-based navigation of uavs in avoiding stationary and mobile obstacles, Drones 7 (4) (2023) 245

  2. [2]

    A. B. Najjar, A. R. Al-Issa, M. Hosny, Dynamic indoor path planning for the visually impaired, Journal of King Saud University-Computer and Information Sciences 34 (9) (2022) 7014–7024

  3. [3]

    S.M.Shin,J.Lim,Y.Choi,Guidedogar:Atactileandauditoryassistingdevicedesignwiththemotifofaguidedogforthevisuallyimpaired, International Journal of Human–Computer Interaction (2024) 1–14

  4. [4]

    H. R. Surougi, J. A. McCann, Real-time optimisation-based path planning for visually impaired people in dynamic environments, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1839–1848

  5. [5]

    I.Patel,M.Kulkarni,N.Mehendale,Reviewofsensor-drivenassistivedevicetechnologiesforenhancingnavigationforthevisuallyimpaired, Multimedia Tools and Applications 83 (17) (2024) 52171–52195

  6. [6]

    S. Teng, X. Hu, P. Deng, B. Li, Y. Li, Y. Ai, D. Yang, L. Li, Z. Xuanyuan, F. Zhu, et al., Motion planning for autonomous driving: The state of the art and future perspectives, IEEE Transactions on Intelligent Vehicles 8 (6) (2023) 3692–3711

  7. [7]

    C.Zhang,W.Xu,Intelligentvehiclepathbasedondiscretizedsamplingpointsandimprovedcostfunction:Aquadraticprogrammingapproach, IEEE Access 12 (2024) 24500–24515

  8. [8]

    J.Wang,L.Chu,Y.Zhang,Y.Mao,C.Guo,Intelligentvehicledecision-makingandtrajectoryplanningmethodbasedondeepreinforcement learning in the frenet space, Sensors 23 (24) (2023) 9819

  9. [9]

    IEEE Journal of Biomedical and Health Informatics , author =

    S. Siboo, A. Bhattacharyya, R. Naveen Raj, S. H. Ashwin, An empirical study of ddpg and ppo-based reinforcement learning algorithms for autonomous driving, IEEE Access 11 (2023) 125094–125108.doi:10.1109/ACCESS.2023.3330665

  10. [10]

    Q. Xiao, L. Jiang, M. Wang, X. Zhang, An improved distributed sampling ppo algorithm based on beta policy for continuous global path planning scheme, Sensors 23 (13) (2023) 6101

  11. [11]

    R.Zhang,J.Hou,G.Chen,Z.Li,J.Chen,A.Knoll,Residualpolicylearningfacilitatesefficientmodel-freeautonomousracing,IEEERobotics and Automation Letters 7 (4) (2022) 11625–11632

  12. [12]

    S.Wen,Y.Shu,A.Rad,Z.Wen,Z.Guo,S.Gong,Adeepresidualreinforcementlearningalgorithmbasedonsoftactor-criticforautonomous navigation, Expert Systems with Applications 259 (2025) 125238

  13. [13]

    R.Zhang,X.Qin,M.Pan,S.Li,H.Shen,Adaptivetemporalreinforcementlearningformappingcomplexmaritimeenvironmentalstatespaces in autonomous ship navigation, Journal of Marine Science and Engineering 13 (3) (2025) 514

  14. [14]

    Z.Zhang,C.Shi,P.Zhu,Z.Zeng,H.Zhang,Autonomousexplorationofmobilerobotsviadeepreinforcementlearningbasedonspatiotemporal information on graph, Applied Sciences 11 (18) (2021) 8299

  15. [15]

    Trauth, A

    R. Trauth, A. Hobmeier, J. Betz, A reinforcement learning-boosted motion planning framework: Comprehensive generalization performance in autonomous driving, in: 2024 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2024, pp. 2413–2420

  16. [16]

    S.Zhao,S.-H.Hwang,Exploration-andexploitation-drivendeepdeterministicpolicygradientforactiveslaminunknownindoorenvironments, Electronics 13 (5) (2024) 999

  17. [17]

    N.Fernando,D.A.McMeekin,I.Murray,Routeplanningmethodsinindoornavigationtoolsforvisionimpairedpersons:asystematicreview, Disability and Rehabilitation: Assistive Technology 18 (6) (2023) 763–782

  18. [18]

    Nawaz, K

    W. Nawaz, K. U. Khan, K. Bashir, A review on path selection and navigation approaches towards an assisted mobility of visually impaired people, KSII Transactions on Internet and Information Systems (TIIS) 14 (8) (2020) 3270–3294

  19. [19]

    D. Chen, S. Li, J. Wang, Y. Feng, Y. Liu, A multi-objective trajectory planning method based on the improved immune clonal selection algorithm, Robotics and computer-integrated manufacturing 59 (2019) 431–442

  20. [20]

    Balatti, I

    P. Balatti, I. Ozdamar, D. Sirintuna, L. Fortini, M. Leonori, J. M. Gandarias, A. Ajoudani, Robot-assisted navigation for visually impaired through adaptive impedance and path planning, in: 2024 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2024, pp. 2310–2316

  21. [21]

    First Author et al.:Preprint submitted to ElsevierPage 23 of 24 Short Title of the Article

    B.Li,Y.Ouyang,L.Li,Y.Zhang,Autonomousdrivingoncurvyroadswithoutrelianceonfrenetframe:Acartesian-basedtrajectoryplanning method, IEEE Transactions on Intelligent Transportation Systems 23 (9) (2022) 15729–15741. First Author et al.:Preprint submitted to ElsevierPage 23 of 24 Short Title of the Article

  22. [23]

    B. Li, Y. Zhang, Fast trajectory planning in cartesian rather than frenet frame: A precise solution for autonomous driving in complex urban scenarios, IFAC-PapersOnLine 53 (2) (2020) 17065–17070

  23. [24]

    Morsali, E

    M. Morsali, E. Frisk, J. Åslund, Spatio-temporal planning in multi-vehicle scenarios for autonomous vehicle using support vector machines, IEEE Transactions on Intelligent Vehicles 6 (4) (2020) 611–621

  24. [25]

    Zhang, S

    Y. Zhang, S. Wang, Lspp: A novel path planning algorithm based on perceiving line segment feature, IEEE Sensors Journal 22 (1) (2021) 720–731

  25. [26]

    Shu, N.-D

    K. Shu, N.-D. Ðào, W. Shi, A. Khajepour, Group frenet frame cav path planning on highways, IEEE Internet of Things Journal 11 (4) (2023) 6776–6787

  26. [27]

    Huang, Z

    J. Huang, Z. He, Y. Arakawa, B. Dawton, Trajectory planning in frenet frame via multi-objective optimization, IEEE Access 11 (2023) 70764–70777

  27. [28]

    Y. Wang, Z. Lin, Research on path planning for autonomous vehicle based on frenet system, Journal of engineering research 11 (2) (2023) 100080

  28. [29]

    J.Wang,J.Wu,X.Zheng,D.Ni,K.Li,Drivingsafetyfieldtheorymodelinganditsapplicationinpre-collisionwarningsystem,Transportation research part C: emerging technologies 72 (2016) 306–324

  29. [30]

    R.Zhang,H.Guo,M.A.Sotelo,H.Du,A.Darius,Z.Li,Newrrt-basedmethodforvehiclepathplanningincurvescenariosconsideringpath oscillations, IEEE Transactions on Vehicular Technology (2025)

  30. [31]

    Bajpai, A

    A. Bajpai, A. Lu, K. Choi, R. Tayal, A. Young, A. Mazumdar, Improving human situational awareness and planning using a human-centric velocity-obstacle algorithm, ACM Transactions on Human-Robot Interaction 14 (3) (2025) 1–28

  31. [32]

    B.Zhao,Y.Wu,C.Wu,R.Sun,Deepreinforcementlearningtrajectoryplanningforroboticmanipulatorbasedonsimulation-efficienttraining, Scientific Reports 15 (1) (2025) 8286

  32. [33]

    Y. Qin, Y. Huang, W. Yu, H. Wang, Roitp: Road obstacle-involved trajectory planner for autonomous trucks, Chinese Journal of Mechanical Engineering 38 (1) (2025) 9

  33. [34]

    M.Jin,M.Qu,Q.Gao,Z.Huang,T.Su,Z.Liang,Advancedtrajectoryplanningandcontrolforautonomousvehicleswithquinticpolynomials, Sensors 24 (24) (2024) 7928

  34. [35]

    X.Qian,F.Altché,P.Bender,C.Stiller,A.deLaFortelle,Optimaltrajectoryplanningforautonomousdrivingintegratinglogicalconstraints: An miqp perspective, in: 2016 IEEE 19th international conference on intelligent transportation systems (ITSC), IEEE, 2016, pp. 205–210

  35. [36]

    J. Wang, J. Wu, Y. Li, The driving safety field based on driver–vehicle–road interactions, IEEE Transactions on Intelligent Transportation Systems 16 (4) (2015) 2203–2214

  36. [37]

    Helbing, J

    D. Helbing, J. Keltsch, P. Molnar, Modelling the evolution of human trail systems, Nature 388 (6637) (1997) 47–50

  37. [38]

    K. Chu, M. Lee, M. Sunwoo, Local path planning for off-road autonomous driving with avoidance of static obstacles, IEEE transactions on intelligent transportation systems 13 (4) (2012) 1599–1616

  38. [39]

    Zhang, D

    C. Zhang, D. Chu, S. Liu, Z. Deng, C. Wu, X. Su, Trajectory planning and tracking for autonomous vehicle based on state lattice and model predictive control, IEEE Intelligent Transportation systems magazine 11 (2) (2019) 29–40

  39. [40]

    R.Trauth,K.Moller,G.Würsching,J.Betz,Frenetix:Ahigh-performanceandmodularmotionplanningframeworkforautonomousdriving, IEEE Access (2024)

  40. [41]

    Brito, A

    B. Brito, A. Agarwal, J. Alonso-Mora, Learning interaction-aware guidance policies for motion planning in dense traffic scenarios, arXiv preprint arXiv:2107.04538 (2021)

  41. [42]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)

  42. [43]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  43. [44]

    Hochreiter, J

    S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780

  44. [45]

    D.Dakopoulos,N.G.Bourbakis,Wearableobstacleavoidanceelectronictravelaidsforblind:asurvey,IEEETransactionsonSystems,Man, and Cybernetics, Part C (Applications and Reviews) 40 (1) (2009) 25–35

  45. [46]

    Fertig, L

    A. Fertig, L. Balasubramanian, M. Botsch, Hybrid machine learning model with a constrained action space for trajectory prediction, arXiv preprint arXiv:2501.03666 (2025)

  46. [47]

    H.Khan,T.D.Chaudhari,J.V.N.Ramesh,A.S.Kranthi,E.Muniyandy,Y.A.B.El-Ebiary,D.N.P.Devadhas,Neuro-symbolicreinforcement learningforcontext-awaredecisionmakinginsafeautonomousvehicles.,InternationalJournalofAdvancedComputerScience&Applications 16 (5) (2025)

  47. [48]

    C.Glanois,P.Weng,M.Zimmer,D.Li,T.Yang,J.Hao,W.Liu,Asurveyoninterpretablereinforcementlearning,MachineLearning113(8) (2024) 5847–5890

  48. [49]

    M.Althoff,M.Koschi,S.Manzinger,Commonroad:Composablebenchmarksformotionplanningonroads,in:2017IEEEIntelligentVehicles Symposium (IV), IEEE, 2017, pp. 719–726. First Author et al.:Preprint submitted to ElsevierPage 24 of 24