NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation
Pith reviewed 2026-05-20 19:11 UTC · model grok-4.3
The pith
Perturbation-aware fine-tuning and a Transformer policy let RL navigation transfer zero-shot from sim to real robots by modeling noise, latency, and control gaps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a system-level framework built around systematic identification of four domain discrepancies, followed by perturbation-aware fine-tuning to account for them and a Transformer-based temporal reasoning policy using short-horizon observations, enables reinforcement learning navigation policies to achieve effective zero-shot sim-to-real transfer, outperforming learning-based baselines across environments and reaching performance levels comparable to optimization-based planners in static settings.
What carries the argument
Perturbation-aware fine-tuning, a post-training adaptation step that explicitly incorporates the four identified sim-to-real discrepancies, together with a Transformer-based policy that processes short sequences of recent observations to improve temporal reasoning and control smoothness.
If this is right
- Policies trained in simulation can be deployed directly on physical robots across aerial and legged platforms for navigation tasks.
- The method outperforms other learning-based navigation approaches in both static and dynamic simulated environments.
- In static environments the learned policies reach performance levels comparable to optimization-based planners without requiring online optimization.
- The combination of fine-tuning and temporal policy reduces the impact of real-world sensor and control imperfections during deployment.
Where Pith is reading between the lines
- The same discrepancy-modeling approach could be applied to other robotic control problems that currently suffer from sim-to-real gaps.
- If the fine-tuning step proves cheap to run, it might reduce reliance on large-scale real-world data collection for RL robotics.
- Extending the short-horizon Transformer reasoning to longer sequences could support more complex multi-step navigation behaviors.
Load-bearing premise
That the four listed discrepancies are the dominant factors to model and that explicitly fine-tuning for them after training will produce policies that generalize to new real-world settings and tasks without further adaptation.
What would settle it
A real-world test on an unseen robot platform or environment where navigation performance drops sharply after applying the described fine-tuning and policy would show the transfer is not as robust as claimed.
Figures
read the original abstract
Recent years have witnessed significant progress in autonomous navigation using reinforcement learning. However, existing approaches largely emphasize reinforcement learning framework design, such as input representations, action spaces, and reward functions, while providing limited analysis of sim-to-real transfer and insufficient insight into how training strategies affect real-world deployment performance. To bridge this gap, we not only introduce an effective RL framework but also present a complete training and deployment pipeline, along with a systematic empirical study that disentangles the key factors affecting sim-to-real transfer in reinforcement learning-based navigation, including sensor noise, perception failures, system latency, and control response. Building on insights from this analysis, we introduce perturbation-aware fine-tuning, a post-training adaptation strategy that improves transfer robustness by explicitly accounting for empirically identified domain discrepancies. To further mitigate perception degradation and enhance control smoothness in real-world deployment, we propose a Transformer-based temporal reasoning policy that leverages short-horizon observation for navigation control. We quantitatively evaluate how individual sim-to-real perturbations and training design choices impact navigation performance across environments. Experimental results demonstrate that the proposed training strategy and policy architecture outperform learning-based baselines in both static and dynamic environments, while achieving performance comparable to optimization-based planners in static settings. We validate our approach through real-world deployment on multiple robotic platforms, including aerial and legged robots, across navigation-centric tasks such as exploration and inspection, demonstrating zero-shot sim-to-real transfer.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents NavRL++, a system-level framework for RL-based robot navigation. It includes a complete training and deployment pipeline, a systematic empirical study disentangling sim-to-real factors (sensor noise, perception failures, system latency, control response), perturbation-aware fine-tuning as a post-training adaptation strategy, and a Transformer-based temporal reasoning policy using short-horizon observations. The central claims are that the approach outperforms learning-based baselines in static and dynamic environments, matches optimization-based planners in static settings, and achieves zero-shot sim-to-real transfer validated on aerial and legged robots for exploration and inspection tasks.
Significance. If the performance claims and zero-shot transfer results hold under rigorous controls, the work would offer practical value to the robotics community by providing a systematic dissection of domain discrepancies and a targeted fine-tuning method that reduces reliance on real-world adaptation. The multi-platform real-world validation and comparison to both learning and optimization baselines are positive elements that could inform future sim-to-real pipelines in navigation.
major comments (2)
- [Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.
- [Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.
minor comments (2)
- [Policy Architecture] Notation for the Transformer policy's short-horizon observation window and temporal reasoning components could be clarified with explicit equations or pseudocode to improve reproducibility.
- [Conclusion] The manuscript would benefit from a dedicated limitations paragraph discussing the scope of environments and tasks tested, to temper the generalization claims.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to improve experimental transparency and add supporting analyses where needed.
read point-by-point responses
-
Referee: [Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.
Authors: We agree that explicit reporting of trial counts, variance, statistical tests, and precise perturbation distributions is necessary to support the robustness claims. The current manuscript does not include these details at the requested level of specificity. In the revision we will expand the Experimental Results and Setup sections to report the number of independent trials conducted for each condition, include mean performance with standard deviation across runs, add statistical significance testing (e.g., t-tests with p-values for key comparisons), and specify the exact distributions and parameters used for each of the four discrepancies, calibrated from real-robot measurements. These additions will be placed in the main text and supplementary material. revision: yes
-
Referee: [Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.
Authors: We acknowledge that the current presentation lacks explicit sensitivity analysis to parameter misspecification and unmodeled effects. While the empirical study already isolates the contribution of each modeled factor, we agree this does not fully address robustness to imperfect modeling. In the revision we will add a dedicated sensitivity analysis subsection (or appendix) that reports policy performance under deliberate ±20-30% perturbations to each discrepancy parameter and under additional unmodeled effects such as actuator nonlinearities and lighting changes. These results will be used to qualify the zero-shot transfer claims. revision: yes
Circularity Check
No circularity: claims rest on empirical measurements and deployments without self-referential derivations or fitted predictions
full rationale
The paper describes an RL navigation framework, perturbation-aware fine-tuning, and a Transformer policy, supported by quantitative experiments and real-world deployments on aerial/legged robots. No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or ansatzes from the same work. The four domain discrepancies are identified via analysis and then explicitly modeled, but this is an empirical design choice rather than a definitional loop. Central performance claims are validated externally via physical tests, satisfying independence criteria.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
perturbation-aware fine-tuning... explicitly accounting for empirically identified domain discrepancies... sensor noise, perception failures, system latency, and control response
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Transformer-based temporal reasoning policy... short-horizon observation histories
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ziran Li, Yanwen Zhang, Hao Wu, Satoshi Suzuki, Akio Namiki, and Wei Wang. Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023
work page 2023
-
[2]
Won Joon Yun, Soohyun Park, Joongheon Kim, Myung- Jae Shin, Soyi Jung, David A Mohaisen, and Jae-Hyun Kim. Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022
work page 2022
-
[3]
Boyu Zhou, Hao Xu, and Shaojie Shen. Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023
work page 2023
-
[4]
Boyu Zhou, Fei Gao, Luqi Wang, Chuhao Liu, and Shaojie Shen. Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019. doi: 10.1109/LRA.2019.2927938
-
[5]
IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020
Xin Zhou, Zhepei Wang, Hongkai Ye, Chao Xu, and Fei Gao. Ego-planner: An esdf-free gradient-based local planner for quadrotors.IEEE Robotics and Automation Letters, 6(2):478–485, 2021. doi: 10.1109/LRA.2020. 3047728
-
[6]
Far planner: Fast, attemptable route planner using dynamic visibility update
Fan Yang, Chao Cao, Hongbiao Zhu, Jean Oh, and Ji Zhang. Far planner: Fast, attemptable route planner using dynamic visibility update. In2022 ieee/rsj interna- tional conference on intelligent robots and systems (iros), pages 9–16. IEEE, 2022
work page 2022
-
[7]
Minghao Lu, Xiyu Fan, Han Chen, and Peng Lu. Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024
work page 2024
-
[8]
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999
work page 1999
-
[9]
Benchmarking deep reinforcement learn- ing for continuous control
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learn- ing for continuous control. InInternational conference on machine learning, pages 1329–1338. PMLR, 2016
work page 2016
-
[10]
Dong Hu, Chao Huang, Jingda Wu, and Xin Yuan. To- ward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy.IEEE Transactions on Intelligent Transportation Systems, 2025
work page 2025
-
[11]
Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, and Guanya Shi. Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024
-
[12]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021
Antonio Loquercio, Elia Kaufmann, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021
work page 2021
-
[14]
Zhanteng Xie and Philip Dames. DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023. doi: 10.1109/TRO.2023.3257549
-
[15]
Junjie Lu, Xuewei Zhang, Hongming Shen, Liwen Xu, and Bailing Tian. You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024
work page 2024
-
[16]
Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, and Kenji Shimada. Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025. doi: 10.1109/LRA.2025.3546069
-
[17]
Deep Neural Network for Real-Time Autonomous Indoor Navigation
Dong Ki Kim and Tsuhan Chen. Deep neural network for real-time autonomous indoor navigation.arXiv preprint arXiv:1511.04668, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
A deep-network solution towards model-less obstacle avoidance
Lei Tai, Shaohua Li, and Ming Liu. A deep-network solution towards model-less obstacle avoidance. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2759–2764, 2016. doi: 10. 1109/IROS.2016.7759428
-
[19]
Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4241–4247. IEEE, 2017
work page 2017
-
[20]
Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. Learn- ing to fly by crashing. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3948–3955. IEEE, 2017
work page 2017
-
[21]
Ram Prasad Padhy, Sachin Verma, Shahzad Ahmad, Suman Kumar Choudhury, and Pankaj Kumar Sa. Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018
work page 2018
-
[22]
Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles
Arbaaz Khan and Martial Hebert. Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles. In2018 IEEE Aerospace Conference, pages 1–9. IEEE, 2018
work page 2018
-
[23]
A deep learn- ing approach towards autonomous flight in forest envi- ronments
Sinahi Dionisio-Ortega, L Oyuki Rojas-Perez, Jose Martinez-Carranza, and Israel Cruz-Vega. A deep learn- ing approach towards autonomous flight in forest envi- ronments. In2018 International Conference on Electron- ics, Communications and Computers (CONIELECOMP), pages 139–144. IEEE, 2018
work page 2018
-
[24]
Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018
Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018
work page 2018
-
[25]
Varun Tolani, Somil Bansal, Aleksandra Faust, and Claire Tomlin. Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021
work page 2021
-
[26]
ViNT: A foundation model for visual navigation,
Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023
-
[27]
Nomad: Goal masked diffusion policies for navigation and exploration
Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. Nomad: Goal masked diffusion policies for navigation and exploration. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 63–70. IEEE, 2024
work page 2024
-
[28]
Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025
work page 2025
-
[29]
Neupan: Direct point robot navigation with end-to-end model-based learning
Ruihua Han, Shuai Wang, Shuaijun Wang, Zeqing Zhang, Jianjun Chen, Shijie Lin, Chengyang Li, Chengzhong Xu, Yonina C Eldar, Qi Hao, et al. Neupan: Direct point robot navigation with end-to-end model-based learning. IEEE Transactions on Robotics, 2025
work page 2025
-
[30]
Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, and Jiangmiao Pang. Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025
-
[31]
Sunggoo Jung, Sunyou Hwang, Heemin Shin, and David Hyunchul Shim. Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018
work page 2018
-
[32]
Mononav: Mav navigation via monocular depth estimation and reconstruction
Nathaniel Simon and Anirudha Majumdar. Mononav: Mav navigation via monocular depth estimation and reconstruction. InInternational Symposium on Exper- imental Robotics, pages 415–426. Springer, 2023
work page 2023
-
[33]
Task-driven com- pression for collision encoding based on depth images
Mihir Kulkarni and Kostas Alexis. Task-driven com- pression for collision encoding based on depth images. InInternational Symposium on Visual Computing, pages 259–273. Springer, 2023
work page 2023
-
[34]
CAD2RL: Real Single-Image Flight without a Single Real Image
Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image.arXiv preprint arXiv:1611.04201, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[35]
Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning
Linhai Xie, Sen Wang, Andrew Markham, and Niki Trigoni. Towards monocular vision based obstacle avoid- ance through deep reinforcement learning.arXiv preprint arXiv:1706.09829, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning
Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P How. Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 285–292. IEEE, 2017
work page 2017
-
[37]
Socially aware motion planning with deep re- inforcement learning
Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep re- inforcement learning. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017
work page 2017
-
[38]
A vision based deep reinforcement learning algorithm for uav obstacle avoidance
Jeremy Roghair, Amir Niaraki, Kyungtae Ko, and Ali Jannesari. A vision based deep reinforcement learning algorithm for uav obstacle avoidance. InProceedings of SAI Intelligent Systems Conference, pages 115–128. Springer, 2021
work page 2021
-
[39]
Abhik Singla, Sindhu Padakandla, and Shalabh Bhatna- gar. Memory-based deep reinforcement learning for ob- stacle avoidance in uav with limited environment knowl- edge.IEEE transactions on intelligent transportation systems, 22(1):107–118, 2019
work page 2019
-
[40]
A vision- based irregular obstacle avoidance framework via deep reinforcement learning
Lingping Gao, Jianchuan Ding, Wenxi Liu, Haiyin Piao, Yuxin Wang, Xin Yang, and Baocai Yin. A vision- based irregular obstacle avoidance framework via deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9262–9269. IEEE, 2021
work page 2021
-
[41]
Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023
Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023
work page 2023
-
[42]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[43]
Zhanteng Xie and Philip Dames. Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023
work page 2023
-
[44]
Zifan Wang, Xun Yang, Jianzhuang Zhao, Jiaming Zhou, Teli Ma, Ziyao Gao, Arash Ajoudani, and Junwei Liang. End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025
-
[45]
Qihan Qi, Hai Lin, Xinsong Yang, Yaping Sun, Xingxing Ju, and Wenwu Yu. Distributed collision-free control of mass by combining reinforcement learning with fil- tered position barrier certificates and applications.IEEE Transactions on Automation Science and Engineering, 22:23849–23860, 2025
work page 2025
-
[46]
Zifan Wang, Teli Ma, Yufei Jia, Xun Yang, Jiaming Zhou, Wenlong Ouyang, Qiang Zhang, and Junwei Liang. Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025
-
[47]
Bowen Xu, Zexuan Yan, Minghao Lu Member, Xiyu Fan, Yi Luo, Youshen Lin, Zhiqiang Chen, Yeke Chen, 18 Qiyuan Qiao, and Peng Lu. Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025
work page 2025
-
[48]
Vision-guided collision avoidance through deep reinforcement learning
Sirui Song, Yuanhang Zhang, Xi Qin, Kirk Saunders, and Jundong Liu. Vision-guided collision avoidance through deep reinforcement learning. InNAECON 2021-IEEE National Aerospace and Electronics Conference, pages 191–194. IEEE, 2021
work page 2021
-
[49]
Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021
Zhihan Xue and Tad Gonsalves. Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021
work page 2021
-
[50]
Minwoo Kim, Jongyun Kim, Minjae Jung, and Hyon- dong Oh. Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022
work page 2022
-
[51]
Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning
Raffaele Brilli, Marco Legittimo, Francesco Crocetti, Mirko Leomanni, Mario Luca Fravolini, and Gabriele Costante. Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 12535–12541. IEEE, 2023
work page 2023
-
[52]
Maria Krinner, Angel Romero, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron, and Davide Scara- muzza. Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024
-
[53]
Actor-critic model predictive control
Angel Romero, Yunlong Song, and Davide Scaramuzza. Actor-critic model predictive control. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14777–14784. IEEE, 2024
work page 2024
-
[54]
Yongkui Liu, He Xu, Ding Liu, and Lihui Wang. A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022
work page 2022
-
[55]
TRANSIC: Sim-to-real policy transfer by learning from online correction
Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. TRANSIC: Sim-to-real policy transfer by learning from online correction. In8th Annual Conference on Robot Learning, 2024. URL https:// openreview.net/forum?id=lpjPft4RQT
work page 2024
-
[56]
Paul Maria Scheikl, Eleonora Tagliabue, Bal ´azs Gyenes, Martin Wagner, Diego Dall’Alba, Paolo Fiorini, and Franziska Mathis-Ullrich. Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022
work page 2022
-
[57]
Jingda Wu, Yanxin Zhou, Haohan Yang, Zhiyu Huang, and Chen Lv. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023
work page 2023
-
[58]
Michael Everett, Yu Fan Chen, and Jonathan P How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021
work page 2021
-
[59]
Yunlong Song, Kexin Shi, Robert Penicka, and Da- vide Scaramuzza. Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022
-
[60]
David Hoeller, Lorenz Wellhausen, Farbod Farshidian, and Marco Hutter. Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021
work page 2021
-
[61]
Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight
Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chun- wei Xing, and Davide Scaramuzza. Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5330–5337. IEEE, 2024
work page 2024
-
[62]
Dropo: Sim-to-real transfer with offline domain randomization
Gabriele Tiboni, Karol Arndt, and Ville Kyrki. Dropo: Sim-to-real transfer with offline domain randomization. Robotics and Autonomous Systems, 166:104432, 2023
work page 2023
-
[63]
Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018. doi: 10.1109/TRO.2018.2853729
-
[64]
Fast-lio2: Fast direct lidar-inertial odometry
Wei Xu, Yixi Cai, Dongjiao He, Jiarong Lin, and Fu Zhang. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Transactions on Robotics, 38(4):2053–2073, 2022. doi: 10.1109/TRO.2022.3141876
-
[65]
Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017
work page 2017
-
[66]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation.arXiv preprint arXiv:1506.02438, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[67]
Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments
Jiahao Lin, Hai Zhu, and Javier Alonso-Mora. Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2682–2688. IEEE, 2020
work page 2020
-
[68]
Zhefan Xu, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, and Kenji Shimada. Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023
work page 2023
-
[69]
Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998
work page 1998
-
[70]
ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions
Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, and Kenji Shimada. Vision-aided uav navigation and dynamic obstacle avoidance using gradient-based b- spline trajectory optimization. In2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 1214–1220, 2023. doi: 10.1109/ICRA48891.2023. 10160638
-
[71]
Zhefan Xu, Di Deng, and Kenji Shimada. Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021. doi: 10.1109/ LRA.2021.3062008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.