Dynamic-TD3: A Novel Algorithm for UAV Path Planning with Dynamic Obstacle Trajectory Prediction

Jingtang Chen; Mingjian Fu; Tiantian Li; Wentao Chen; Wenxi Liu; Youfeng Su; Yuanlong Yu

arxiv: 2605.00059 · v1 · submitted 2026-04-30 · 💻 cs.RO · cs.AI

Dynamic-TD3: A Novel Algorithm for UAV Path Planning with Dynamic Obstacle Trajectory Prediction

Wentao Chen , Jingtang Chen , Mingjian Fu , Tiantian Li , Youfeng Su , Wenxi Liu , Yuanlong Yu This is my paper

Pith reviewed 2026-05-09 20:56 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords UAV path planningdynamic obstaclesdeep reinforcement learningcollision avoidanceconstrained Markov decision processtrajectory predictionsensor noiseenergy efficient flight

0 comments

The pith

Dynamic-TD3 uses constrained reinforcement learning and physical filters to improve UAV collision avoidance and energy use amid moving obstacles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Dynamic-TD3 to resolve the safety-exploration tradeoff in deep reinforcement learning for drone navigation in risky settings with moving threats. It frames the problem as a Constrained Markov Decision Process and adds an Adaptive Trajectory Relational Evolution Mechanism to track long-range obstacle paths plus a Physically Aware Gated Kalman Filter to reduce the impact of noisy or changing sensor readings. The combined state then feeds a policy that trades off task goals against hard safety limits using Lagrangian relaxation. Experiments show the resulting flights avoid collisions more effectively, consume less energy, and follow smoother paths than earlier methods. Readers interested in practical autonomous flight would care because soft-penalty training risks crashes while rigid constraint methods often make drones overly cautious or inefficient.

Core claim

Dynamic-TD3 is a physically enhanced framework that enforces strict safety constraints while maintaining maneuverability by modeling navigation as a Constrained Markov Decision Process. This framework integrates an Adaptive Trajectory Relational Evolution Mechanism to capture long-range intentions and employs a Physically Aware Gated Kalman Filter to mitigate non-stationary observation noise. The resulting state representation drives a dual-criterion policy that balances mission efficiency against hard safety constraints via Lagrangian relaxation. In experiments with aggressive dynamic threats, this approach demonstrates superior collision avoidance performance, reduced energy consumption, a

What carries the argument

Adaptive Trajectory Relational Evolution Mechanism (ATREM) and Physically Aware Gated Kalman Filter (PAG-KF) inside a Constrained Markov Decision Process (CMDP) that produces the state for a Lagrangian-relaxed dual-criterion policy.

If this is right

The UAV achieves superior collision avoidance against aggressive dynamic threats.
Energy consumption decreases compared with prior reinforcement learning approaches.
Flight trajectories become smoother while still reaching mission goals.
Hard safety constraints are maintained without sacrificing overall maneuverability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same constraint-plus-filter structure could be tested on ground robots or autonomous cars that must thread through moving traffic.
Further work might replace the simulated noise model with real sensor traces to check whether the filter still stabilizes the policy.
Multi-agent versions could let several drones share the same relational evolution mechanism to coordinate around shared dynamic obstacles.
The Lagrangian relaxation step suggests a route to add new mission constraints, such as no-fly zones, without retraining the entire policy from scratch.

Load-bearing premise

The Adaptive Trajectory Relational Evolution Mechanism accurately captures long-range obstacle intentions and the Physically Aware Gated Kalman Filter sufficiently mitigates non-stationary sensor noise and intent uncertainty under real-world conditions.

What would settle it

A real-world UAV flight test in which the vehicle collides with a fast-moving obstacle whose trajectory was predicted by the system or consumes more energy than a baseline method while completing the same mission.

Figures

Figures reproduced from arXiv: 2605.00059 by Jingtang Chen, Mingjian Fu, Tiantian Li, Wentao Chen, Wenxi Liu, Youfeng Su, Yuanlong Yu.

**Figure 2.** Figure 2: This figure demonstrates the comparison of total [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation experiment results. The impact of each [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Deep reinforcement learning (DRL) finds extensive application in autonomous drone navigation within complex, high-risk environments. However, its practical deployment faces a safety-exploration dilemma: soft penalty mechanisms encourage risky trial-and-error, while most constraint-based methods suffer degraded performance under sensor noise and intent uncertainty. We propose Dynamic-TD3, a physically enhanced framework that enforces strict safety constraints while maintaining maneuverability by modeling navigation as a Constrained Markov Decision Process (CMDP). This framework integrates an Adaptive Trajectory Relational Evolution Mechanism (ATREM) to capture long-range intentions and employs a Physically Aware Gated Kalman Filter (PAG-KF) to mitigate non-stationary observation noise. The resulting state representation drives a dual-criterion policy that balances mission efficiency against hard safety constraints via Lagrangian relaxation. In experiments with aggressive dynamic threats, this approach demonstrates superior collision avoidance performance, reduced energy consumption, and smoother flight trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Dynamic-TD3, a DRL framework for UAV path planning in dynamic environments. It models navigation as a Constrained Markov Decision Process (CMDP), introduces the Adaptive Trajectory Relational Evolution Mechanism (ATREM) to capture long-range obstacle intentions, and the Physically Aware Gated Kalman Filter (PAG-KF) to mitigate non-stationary sensor noise and intent uncertainty. A dual-criterion policy is optimized via TD3 with Lagrangian relaxation to enforce hard safety constraints while pursuing mission efficiency. Experiments in simulated scenarios with aggressive dynamic threats are reported to yield superior collision avoidance, reduced energy consumption, and smoother trajectories relative to baselines.

Significance. If the performance gains can be rigorously attributed to the proposed components through controlled experiments and the methods are fully specified with equations and implementation details, the work could advance safe RL applications in robotics by addressing the safety-exploration tradeoff under realistic uncertainty. The combination of physical modeling with constrained policy optimization is a constructive direction. However, the absence of quantitative results, baselines, error analysis, and component ablations in the available text prevents a full assessment of novelty or impact.

major comments (2)

[Results] Results section: The central claim of experimental superiority in collision avoidance, energy use, and trajectory smoothness is not supported by ablation studies that isolate the contributions of ATREM and PAG-KF (e.g., full model vs. CMDP+TD3 without ATREM, without PAG-KF). Without such disaggregation, performance deltas cannot be attributed to the named innovations rather than hyperparameter choices or the base constrained RL formulation, undermining the abstract's positioning of these modules as the solution to the safety-exploration dilemma.
[Methods] Methods section: No equations, pseudocode, or implementation details are provided for ATREM (trajectory relational evolution), PAG-KF (gated Kalman filtering), or the Lagrangian relaxation update within the dual-criterion TD3 policy. This prevents verification of how long-range intentions are captured or how non-stationary noise is mitigated, making the soundness of the framework impossible to evaluate from the manuscript.

minor comments (2)

[Abstract] Abstract: The phrase 'superior collision avoidance performance' is stated without reference to specific metrics (e.g., collision rate, minimum distance) or the exact baselines compared, reducing clarity for readers.
[Introduction] The manuscript would benefit from a dedicated related-work subsection contrasting the proposed CMDP+Lagrangian approach with prior constrained RL methods for UAVs to better highlight incremental novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve clarity and rigor.

read point-by-point responses

Referee: [Results] Results section: The central claim of experimental superiority in collision avoidance, energy use, and trajectory smoothness is not supported by ablation studies that isolate the contributions of ATREM and PAG-KF (e.g., full model vs. CMDP+TD3 without ATREM, without PAG-KF). Without such disaggregation, performance deltas cannot be attributed to the named innovations rather than hyperparameter choices or the base constrained RL formulation, undermining the abstract's positioning of these modules as the solution to the safety-exploration dilemma.

Authors: We agree that explicit ablation studies are needed to rigorously attribute performance gains to ATREM and PAG-KF rather than the base CMDP+TD3 formulation. The current experiments report overall improvements against baselines but lack component-wise disaggregation. In the revised manuscript we will add controlled ablations (full model vs. without ATREM, without PAG-KF) with quantitative metrics on collision rate, energy, and smoothness to support the claims. revision: yes
Referee: [Methods] Methods section: No equations, pseudocode, or implementation details are provided for ATREM (trajectory relational evolution), PAG-KF (gated Kalman filtering), or the Lagrangian relaxation update within the dual-criterion TD3 policy. This prevents verification of how long-range intentions are captured or how non-stationary noise is mitigated, making the soundness of the framework impossible to evaluate from the manuscript.

Authors: We acknowledge the absence of equations, pseudocode, and implementation details in the current draft. The revision will include the full mathematical definitions of ATREM for relational trajectory evolution, the PAG-KF gating and update rules, and the Lagrangian multiplier updates for the dual-criterion TD3 policy, together with algorithm pseudocode, to enable verification and reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity detected; proposal uses standard CMDP/Lagrangian methods without self-referential derivations

full rationale

The abstract presents Dynamic-TD3 as a framework integrating ATREM and PAG-KF into CMDP with Lagrangian relaxation for a dual-criterion policy. Lagrangian relaxation is a well-established constrained optimization technique, not derived or redefined here. No equations, derivations, or self-citations appear in the provided text that reduce any claimed result to its inputs by construction. The experimental claims concern performance in simulations but do not involve fitted parameters renamed as predictions or ansatzes smuggled via self-citation. The full manuscript may contain additional details, but the available derivation chain is self-contained and does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The abstract relies on domain assumptions about CMDP modeling and introduces two new mechanisms without providing independent evidence or derivations; no free parameters are explicitly named.

axioms (2)

domain assumption UAV navigation can be effectively modeled as a Constrained Markov Decision Process with enforceable hard safety constraints.
Central modeling choice stated in the abstract.
ad hoc to paper The Adaptive Trajectory Relational Evolution Mechanism can capture long-range intentions of dynamic obstacles.
New component introduced without supporting derivation or prior validation mentioned.

invented entities (2)

Adaptive Trajectory Relational Evolution Mechanism (ATREM) no independent evidence
purpose: Capture long-range obstacle trajectory intentions for improved state representation.
New mechanism proposed in the paper.
Physically Aware Gated Kalman Filter (PAG-KF) no independent evidence
purpose: Mitigate non-stationary observation noise using physical awareness.
New filter variant introduced in the paper.

pith-pipeline@v0.9.0 · 5475 in / 1422 out tokens · 54074 ms · 2026-05-09T20:56:25.875790+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

A lightweight reinforcement-learning-based real-time path-planning method for unmanned aerial vehicles,

Meng Xi, Huiao Dai, Jingyi He, Wenjie Li, Jiabao Wen, Shuai Xiao, and Jiachen Yang, “A lightweight reinforcement-learning-based real-time path-planning method for unmanned aerial vehicles,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 21061–21071, 2024

work page 2024
[2]

Goal-driven autonomous exploration through deep reinforcement learning,

Reinis Cimurs, Il Hong Suh, and Jin Han Lee, “Goal-driven autonomous exploration through deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 730–737, 2022

work page 2022
[3]

Learning navigation behaviors end-to-end with autorl,

Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, and Anthony Francis, “Learning navigation behaviors end-to-end with autorl,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019

work page 2007
[4]

Mapless navigation via hierarchical reinforcement learning with memory-decaying novelty,

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, and Ze Ji, “Mapless navigation via hierarchical reinforcement learning with memory-decaying novelty,”Robotics and Autonomous Systems, vol. 182, pp. 104815, 2024

work page 2024
[5]

Goal-guided transformer-enabled reinforcement learning for efficient autonomous navigation,

Wenhui Huang, Yanxin Zhou, Xiangkun He, and Chen Lv, “Goal-guided transformer-enabled reinforcement learning for efficient autonomous navigation,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 2, pp. 1832–1845, 2024

work page 2024
[6]

Provably efficient primal-dual reinforce- ment learning for cmdps with non-stationary objectives and constraints,

Yuhao Ding and Javad Lavaei, “Provably efficient primal-dual reinforce- ment learning for cmdps with non-stationary objectives and constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 7396–7404

work page 2023
[7]

State-wise constrained policy optimization,

Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, and Changliu Liu, “State-wise constrained policy optimization,”Trans. Mach. Learn. Res., 2023, vol. 2024

work page 2023
[8]

Balance reward and safety optimization for safe reinforcement learning: A perspective of gradient manipulation,

Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll, “Balance reward and safety optimization for safe reinforcement learning: A perspective of gradient manipulation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, pp. 21099–21106, Mar. 2024

work page 2024
[9]

Continuous control with deep reinforcement learning,

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra, “Continuous control with deep reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2016

work page 2016
[10]

Addressing function approximation error in actor-critic methods,

Scott Fujimoto, Herke van Hoof, and David Meger, “Addressing function approximation error in actor-critic methods,” inProceedings of the 35th International Conference on Machine Learning, 10–15 Jul 2018, vol. 80, pp. 1587–1596

work page 2018
[11]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational Conference on Machine Learning (ICML), 2018

work page 2018
[12]

Pri- oritized experience replay,

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver, “Pri- oritized experience replay,” inInternational Conference on Learning Representations (ICLR), 2016

work page 2016
[13]

Uav path planning based on the average td3 algorithm with prioritized experience replay,

Xuqiong Luo, Qiyuan Wang, Hongfang Gong, and Chao Tang, “Uav path planning based on the average td3 algorithm with prioritized experience replay,”IEEE Access, 2024

work page 2024
[14]

Recurrent experience replay in distributed reinforcement learning,

Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, and Remi Munos, “Recurrent experience replay in distributed reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[15]

A lyapunov-based approach to safe reinforcement learning,

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” inAdvances in Neural Information Processing Systems, 2018, vol. 31

work page 2018
[16]

Control of a quadrotor with reinforcement learning,

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter, “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters (RA-L), 2017

work page 2017
[17]

Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight,

Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, and Sergey Levine, “Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight,” inIEEE International Conference on Robotics and Automation (ICRA), 2019

work page 2019
[18]

Champion-level drone racing using deep reinforcement learning,

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M¨uller, Vladlen Koltun, and Davide Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023

work page 2023
[19]

Improve exploration in deep reinforcement learning for uav path planning using state and action entropy,

Hui Lv, Yadong Chen, Shibo Li, Baolong Zhu, and Min Li, “Improve exploration in deep reinforcement learning for uav path planning using state and action entropy,”Measurement Science and Technology, vol. 35, no. 5, pp. 056206, 2024

work page 2024
[20]

Drl-based uav autonomous navigation and obstacle avoidance with lidar and depth camera fusion,

Bangsong Lei, Wei Hu, Zhaoxu Ren, and Shude Ji, “Drl-based uav autonomous navigation and obstacle avoidance with lidar and depth camera fusion,” inAerospace, 2025, vol. 12

work page 2025
[21]

Karl: Kalman-filter assisted reinforcement learner for dynamic object tracking and grasping,

Kowndinya Boyalakuntla, Abdeslam Boularias, and Jingjin Yu, “Karl: Kalman-filter assisted reinforcement learner for dynamic object tracking and grasping,”2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2819–2826, 2025

work page 2025
[22]

Scalable-maddpg-based cooperative target invasion for a multi-usv system,

Cheng-Cheng Wang, Yu-Long Wang, Peng Shi, and Fei Wang, “Scalable-maddpg-based cooperative target invasion for a multi-usv system,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17867–17877, 2024

work page 2024
[23]

A review of safe reinforcement learning: Methods, theories, and applications,

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll, “A review of safe reinforcement learning: Methods, theories, and applications,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 11216–11235, 2024

work page 2024

[1] [1]

A lightweight reinforcement-learning-based real-time path-planning method for unmanned aerial vehicles,

Meng Xi, Huiao Dai, Jingyi He, Wenjie Li, Jiabao Wen, Shuai Xiao, and Jiachen Yang, “A lightweight reinforcement-learning-based real-time path-planning method for unmanned aerial vehicles,”IEEE Internet of Things Journal, vol. 11, no. 12, pp. 21061–21071, 2024

work page 2024

[2] [2]

Goal-driven autonomous exploration through deep reinforcement learning,

Reinis Cimurs, Il Hong Suh, and Jin Han Lee, “Goal-driven autonomous exploration through deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 730–737, 2022

work page 2022

[3] [3]

Learning navigation behaviors end-to-end with autorl,

Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, and Anthony Francis, “Learning navigation behaviors end-to-end with autorl,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019

work page 2007

[4] [4]

Mapless navigation via hierarchical reinforcement learning with memory-decaying novelty,

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, and Ze Ji, “Mapless navigation via hierarchical reinforcement learning with memory-decaying novelty,”Robotics and Autonomous Systems, vol. 182, pp. 104815, 2024

work page 2024

[5] [5]

Goal-guided transformer-enabled reinforcement learning for efficient autonomous navigation,

Wenhui Huang, Yanxin Zhou, Xiangkun He, and Chen Lv, “Goal-guided transformer-enabled reinforcement learning for efficient autonomous navigation,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 2, pp. 1832–1845, 2024

work page 2024

[6] [6]

Provably efficient primal-dual reinforce- ment learning for cmdps with non-stationary objectives and constraints,

Yuhao Ding and Javad Lavaei, “Provably efficient primal-dual reinforce- ment learning for cmdps with non-stationary objectives and constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 7396–7404

work page 2023

[7] [7]

State-wise constrained policy optimization,

Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, and Changliu Liu, “State-wise constrained policy optimization,”Trans. Mach. Learn. Res., 2023, vol. 2024

work page 2023

[8] [8]

Balance reward and safety optimization for safe reinforcement learning: A perspective of gradient manipulation,

Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll, “Balance reward and safety optimization for safe reinforcement learning: A perspective of gradient manipulation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, pp. 21099–21106, Mar. 2024

work page 2024

[9] [9]

Continuous control with deep reinforcement learning,

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra, “Continuous control with deep reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2016

work page 2016

[10] [10]

Addressing function approximation error in actor-critic methods,

Scott Fujimoto, Herke van Hoof, and David Meger, “Addressing function approximation error in actor-critic methods,” inProceedings of the 35th International Conference on Machine Learning, 10–15 Jul 2018, vol. 80, pp. 1587–1596

work page 2018

[11] [11]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational Conference on Machine Learning (ICML), 2018

work page 2018

[12] [12]

Pri- oritized experience replay,

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver, “Pri- oritized experience replay,” inInternational Conference on Learning Representations (ICLR), 2016

work page 2016

[13] [13]

Uav path planning based on the average td3 algorithm with prioritized experience replay,

Xuqiong Luo, Qiyuan Wang, Hongfang Gong, and Chao Tang, “Uav path planning based on the average td3 algorithm with prioritized experience replay,”IEEE Access, 2024

work page 2024

[14] [14]

Recurrent experience replay in distributed reinforcement learning,

Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, and Remi Munos, “Recurrent experience replay in distributed reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2019

work page 2019

[15] [15]

A lyapunov-based approach to safe reinforcement learning,

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” inAdvances in Neural Information Processing Systems, 2018, vol. 31

work page 2018

[16] [16]

Control of a quadrotor with reinforcement learning,

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter, “Control of a quadrotor with reinforcement learning,”IEEE Robotics and Automation Letters (RA-L), 2017

work page 2017

[17] [17]

Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight,

Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, and Sergey Levine, “Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight,” inIEEE International Conference on Robotics and Automation (ICRA), 2019

work page 2019

[18] [18]

Champion-level drone racing using deep reinforcement learning,

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M¨uller, Vladlen Koltun, and Davide Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023

work page 2023

[19] [19]

Improve exploration in deep reinforcement learning for uav path planning using state and action entropy,

Hui Lv, Yadong Chen, Shibo Li, Baolong Zhu, and Min Li, “Improve exploration in deep reinforcement learning for uav path planning using state and action entropy,”Measurement Science and Technology, vol. 35, no. 5, pp. 056206, 2024

work page 2024

[20] [20]

Drl-based uav autonomous navigation and obstacle avoidance with lidar and depth camera fusion,

Bangsong Lei, Wei Hu, Zhaoxu Ren, and Shude Ji, “Drl-based uav autonomous navigation and obstacle avoidance with lidar and depth camera fusion,” inAerospace, 2025, vol. 12

work page 2025

[21] [21]

Karl: Kalman-filter assisted reinforcement learner for dynamic object tracking and grasping,

Kowndinya Boyalakuntla, Abdeslam Boularias, and Jingjin Yu, “Karl: Kalman-filter assisted reinforcement learner for dynamic object tracking and grasping,”2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2819–2826, 2025

work page 2025

[22] [22]

Scalable-maddpg-based cooperative target invasion for a multi-usv system,

Cheng-Cheng Wang, Yu-Long Wang, Peng Shi, and Fei Wang, “Scalable-maddpg-based cooperative target invasion for a multi-usv system,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 17867–17877, 2024

work page 2024

[23] [23]

A review of safe reinforcement learning: Methods, theories, and applications,

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll, “A review of safe reinforcement learning: Methods, theories, and applications,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 11216–11235, 2024

work page 2024