pith. sign in

arxiv: 2605.15559 · v1 · pith:SL3GKCX4new · submitted 2026-05-15 · 💻 cs.RO

NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation

Pith reviewed 2026-05-20 19:11 UTC · model grok-4.3

classification 💻 cs.RO
keywords sim-to-real transferreinforcement learningrobot navigationperturbation-aware fine-tuningTransformer policyzero-shot transferaerial robotslegged robots
0
0 comments X

The pith

Perturbation-aware fine-tuning and a Transformer policy let RL navigation transfer zero-shot from sim to real robots by modeling noise, latency, and control gaps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a full training and deployment pipeline for reinforcement learning robot navigation that goes beyond typical framework tweaks to focus on sim-to-real transfer. Through experiments it isolates how sensor noise, perception failures, system latency, and control response degrade performance, then introduces perturbation-aware fine-tuning to adapt policies after initial training. A Transformer-based policy that reasons over short recent observations further reduces perception issues and smooths control. These changes produce policies that beat standard learning methods in both static and dynamic simulated settings, match classical planners in static cases, and deploy successfully on real aerial and legged robots for exploration and inspection without any additional real-world updates. A reader would care because the work supplies practical, measurable steps to close the simulation-to-reality gap that has long limited RL in physical robots.

Core claim

The paper claims that a system-level framework built around systematic identification of four domain discrepancies, followed by perturbation-aware fine-tuning to account for them and a Transformer-based temporal reasoning policy using short-horizon observations, enables reinforcement learning navigation policies to achieve effective zero-shot sim-to-real transfer, outperforming learning-based baselines across environments and reaching performance levels comparable to optimization-based planners in static settings.

What carries the argument

Perturbation-aware fine-tuning, a post-training adaptation step that explicitly incorporates the four identified sim-to-real discrepancies, together with a Transformer-based policy that processes short sequences of recent observations to improve temporal reasoning and control smoothness.

If this is right

  • Policies trained in simulation can be deployed directly on physical robots across aerial and legged platforms for navigation tasks.
  • The method outperforms other learning-based navigation approaches in both static and dynamic simulated environments.
  • In static environments the learned policies reach performance levels comparable to optimization-based planners without requiring online optimization.
  • The combination of fine-tuning and temporal policy reduces the impact of real-world sensor and control imperfections during deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same discrepancy-modeling approach could be applied to other robotic control problems that currently suffer from sim-to-real gaps.
  • If the fine-tuning step proves cheap to run, it might reduce reliance on large-scale real-world data collection for RL robotics.
  • Extending the short-horizon Transformer reasoning to longer sequences could support more complex multi-step navigation behaviors.

Load-bearing premise

That the four listed discrepancies are the dominant factors to model and that explicitly fine-tuning for them after training will produce policies that generalize to new real-world settings and tasks without further adaptation.

What would settle it

A real-world test on an unseen robot platform or environment where navigation performance drops sharply after applying the described fine-tuning and policy would show the transfer is not as robust as claimed.

Figures

Figures reproduced from arXiv: 2605.15559 by Hanyu Jin, Kenji Shimada, Zhefan Xu.

Figure 1
Figure 1. Figure 1: Overview of NavRL++ real-world deployment. The proposed system supports zero-shot deployment across multiple environments and robotic platforms, including quadruped and aerial robots. The results highlight robust navigation in dynamic, cluttered, and task-oriented scenarios such as pedestrian collision avoidance, surface inspection, and unknown exploration, demonstrating deployment across multiple robotic … view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the robotic platforms used to deploy our RL navigation [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed reinforcement learning–based autonomous navigation framework. The system supports camera-only, LiDAR-only, and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of static obstacle and action representation. (a) Occupancy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the proposed training strategy. (a) The policy is [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the perception module. (a) Raw sensor data, including [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Overall performance benchmarking among baseline RL approaches. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of simulation evaluation with 100 (out of 10,000) [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of individual perturbations and their combinations on success rate and control effort of a policy trained in clean simulation. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Navigation success rate evaluation at each curriculum training stage. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of navigation success rates between policies trained with [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of training success rates under varying training scales. [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: Navigation in Isaac Sim. Static obstacles are visualized as red and [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Gazebo navigation experiments with environment layouts and sample trajectories. The first column presents two randomly generated environments: [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Navigation experiments in Gazebo simulated real-world scenarios. [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Real-world navigation experiments on both aerial and legged robotic platforms. (a) Navigation in cluttered environments using a UAV, with the [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Navigation-centric applications using the proposed framework. In both exploration and inspection tasks, the task planner generates global waypoints [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗
read the original abstract

Recent years have witnessed significant progress in autonomous navigation using reinforcement learning. However, existing approaches largely emphasize reinforcement learning framework design, such as input representations, action spaces, and reward functions, while providing limited analysis of sim-to-real transfer and insufficient insight into how training strategies affect real-world deployment performance. To bridge this gap, we not only introduce an effective RL framework but also present a complete training and deployment pipeline, along with a systematic empirical study that disentangles the key factors affecting sim-to-real transfer in reinforcement learning-based navigation, including sensor noise, perception failures, system latency, and control response. Building on insights from this analysis, we introduce perturbation-aware fine-tuning, a post-training adaptation strategy that improves transfer robustness by explicitly accounting for empirically identified domain discrepancies. To further mitigate perception degradation and enhance control smoothness in real-world deployment, we propose a Transformer-based temporal reasoning policy that leverages short-horizon observation for navigation control. We quantitatively evaluate how individual sim-to-real perturbations and training design choices impact navigation performance across environments. Experimental results demonstrate that the proposed training strategy and policy architecture outperform learning-based baselines in both static and dynamic environments, while achieving performance comparable to optimization-based planners in static settings. We validate our approach through real-world deployment on multiple robotic platforms, including aerial and legged robots, across navigation-centric tasks such as exploration and inspection, demonstrating zero-shot sim-to-real transfer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents NavRL++, a system-level framework for RL-based robot navigation. It includes a complete training and deployment pipeline, a systematic empirical study disentangling sim-to-real factors (sensor noise, perception failures, system latency, control response), perturbation-aware fine-tuning as a post-training adaptation strategy, and a Transformer-based temporal reasoning policy using short-horizon observations. The central claims are that the approach outperforms learning-based baselines in static and dynamic environments, matches optimization-based planners in static settings, and achieves zero-shot sim-to-real transfer validated on aerial and legged robots for exploration and inspection tasks.

Significance. If the performance claims and zero-shot transfer results hold under rigorous controls, the work would offer practical value to the robotics community by providing a systematic dissection of domain discrepancies and a targeted fine-tuning method that reduces reliance on real-world adaptation. The multi-platform real-world validation and comparison to both learning and optimization baselines are positive elements that could inform future sim-to-real pipelines in navigation.

major comments (2)
  1. [Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.
  2. [Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.
minor comments (2)
  1. [Policy Architecture] Notation for the Transformer policy's short-horizon observation window and temporal reasoning components could be clarified with explicit equations or pseudocode to improve reproducibility.
  2. [Conclusion] The manuscript would benefit from a dedicated limitations paragraph discussing the scope of environments and tasks tested, to temper the generalization claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to improve experimental transparency and add supporting analyses where needed.

read point-by-point responses
  1. Referee: [Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.

    Authors: We agree that explicit reporting of trial counts, variance, statistical tests, and precise perturbation distributions is necessary to support the robustness claims. The current manuscript does not include these details at the requested level of specificity. In the revision we will expand the Experimental Results and Setup sections to report the number of independent trials conducted for each condition, include mean performance with standard deviation across runs, add statistical significance testing (e.g., t-tests with p-values for key comparisons), and specify the exact distributions and parameters used for each of the four discrepancies, calibrated from real-robot measurements. These additions will be placed in the main text and supplementary material. revision: yes

  2. Referee: [Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.

    Authors: We acknowledge that the current presentation lacks explicit sensitivity analysis to parameter misspecification and unmodeled effects. While the empirical study already isolates the contribution of each modeled factor, we agree this does not fully address robustness to imperfect modeling. In the revision we will add a dedicated sensitivity analysis subsection (or appendix) that reports policy performance under deliberate ±20-30% perturbations to each discrepancy parameter and under additional unmodeled effects such as actuator nonlinearities and lighting changes. These results will be used to qualify the zero-shot transfer claims. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical measurements and deployments without self-referential derivations or fitted predictions

full rationale

The paper describes an RL navigation framework, perturbation-aware fine-tuning, and a Transformer policy, supported by quantitative experiments and real-world deployments on aerial/legged robots. No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or ansatzes from the same work. The four domain discrepancies are identified via analysis and then explicitly modeled, but this is an empirical design choice rather than a definitional loop. Central performance claims are validated externally via physical tests, satisfying independence criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work implicitly relies on standard RL assumptions about reward design and environment modeling plus the unstated premise that the listed perturbations capture the main transfer gap.

pith-pipeline@v0.9.0 · 5786 in / 1190 out tokens · 47312 ms · 2026-05-20T19:11:35.807662+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 6 internal anchors

  1. [1]

    Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

    Ziran Li, Yanwen Zhang, Hao Wu, Satoshi Suzuki, Akio Namiki, and Wei Wang. Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

  2. [2]

    Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

    Won Joon Yun, Soohyun Park, Joongheon Kim, Myung- Jae Shin, Soyi Jung, David A Mohaisen, and Jae-Hyun Kim. Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

  3. [3]

    Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

    Boyu Zhou, Hao Xu, and Shaojie Shen. Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

  4. [4]

    Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019

    Boyu Zhou, Fei Gao, Luqi Wang, Chuhao Liu, and Shaojie Shen. Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019. doi: 10.1109/LRA.2019.2927938

  5. [5]

    IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

    Xin Zhou, Zhepei Wang, Hongkai Ye, Chao Xu, and Fei Gao. Ego-planner: An esdf-free gradient-based local planner for quadrotors.IEEE Robotics and Automation Letters, 6(2):478–485, 2021. doi: 10.1109/LRA.2020. 3047728

  6. [6]

    Far planner: Fast, attemptable route planner using dynamic visibility update

    Fan Yang, Chao Cao, Hongbiao Zhu, Jean Oh, and Ji Zhang. Far planner: Fast, attemptable route planner using dynamic visibility update. In2022 ieee/rsj interna- tional conference on intelligent robots and systems (iros), pages 9–16. IEEE, 2022

  7. [7]

    Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

    Minghao Lu, Xiyu Fan, Han Chen, and Peng Lu. Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

  8. [8]

    Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

    Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

  9. [9]

    Benchmarking deep reinforcement learn- ing for continuous control

    Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learn- ing for continuous control. InInternational conference on machine learning, pages 1329–1338. PMLR, 2016

  10. [10]

    Dong Hu, Chao Huang, Jingda Wu, and Xin Yuan. To- ward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy.IEEE Transactions on Intelligent Transportation Systems, 2025

  11. [11]

    Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

    Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, and Guanya Shi. Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

  12. [12]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025

  13. [13]

    Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

    Antonio Loquercio, Elia Kaufmann, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

  14. [14]

    DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

    Zhanteng Xie and Philip Dames. DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023. doi: 10.1109/TRO.2023.3257549

  15. [15]

    You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

    Junjie Lu, Xuewei Zhang, Hongming Shen, Liwen Xu, and Bailing Tian. You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

  16. [16]

    Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025

    Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, and Kenji Shimada. Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025. doi: 10.1109/LRA.2025.3546069

  17. [17]

    Deep Neural Network for Real-Time Autonomous Indoor Navigation

    Dong Ki Kim and Tsuhan Chen. Deep neural network for real-time autonomous indoor navigation.arXiv preprint arXiv:1511.04668, 2015

  18. [18]

    A deep-network solution towards model-less obstacle avoidance

    Lei Tai, Shaohua Li, and Ming Liu. A deep-network solution towards model-less obstacle avoidance. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2759–2764, 2016. doi: 10. 1109/IROS.2016.7759428

  19. [19]

    Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness

    Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4241–4247. IEEE, 2017

  20. [20]

    Learn- ing to fly by crashing

    Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. Learn- ing to fly by crashing. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3948–3955. IEEE, 2017

  21. [21]

    Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

    Ram Prasad Padhy, Sachin Verma, Shahzad Ahmad, Suman Kumar Choudhury, and Pankaj Kumar Sa. Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

  22. [22]

    Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles

    Arbaaz Khan and Martial Hebert. Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles. In2018 IEEE Aerospace Conference, pages 1–9. IEEE, 2018

  23. [23]

    A deep learn- ing approach towards autonomous flight in forest envi- ronments

    Sinahi Dionisio-Ortega, L Oyuki Rojas-Perez, Jose Martinez-Carranza, and Israel Cruz-Vega. A deep learn- ing approach towards autonomous flight in forest envi- ronments. In2018 International Conference on Electron- ics, Communications and Computers (CONIELECOMP), pages 139–144. IEEE, 2018

  24. [24]

    Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

    Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

  25. [25]

    Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

    Varun Tolani, Somil Bansal, Aleksandra Faust, and Claire Tomlin. Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

  26. [26]

    ViNT: A foundation model for visual navigation,

    Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023

  27. [27]

    Nomad: Goal masked diffusion policies for navigation and exploration

    Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. Nomad: Goal masked diffusion policies for navigation and exploration. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 63–70. IEEE, 2024

  28. [28]

    Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

    Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

  29. [29]

    Neupan: Direct point robot navigation with end-to-end model-based learning

    Ruihua Han, Shuai Wang, Shuaijun Wang, Zeqing Zhang, Jianjun Chen, Shijie Lin, Chengyang Li, Chengzhong Xu, Yonina C Eldar, Qi Hao, et al. Neupan: Direct point robot navigation with end-to-end model-based learning. IEEE Transactions on Robotics, 2025

  30. [30]

    Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

    Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, and Jiangmiao Pang. Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

  31. [31]

    Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

    Sunggoo Jung, Sunyou Hwang, Heemin Shin, and David Hyunchul Shim. Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

  32. [32]

    Mononav: Mav navigation via monocular depth estimation and reconstruction

    Nathaniel Simon and Anirudha Majumdar. Mononav: Mav navigation via monocular depth estimation and reconstruction. InInternational Symposium on Exper- imental Robotics, pages 415–426. Springer, 2023

  33. [33]

    Task-driven com- pression for collision encoding based on depth images

    Mihir Kulkarni and Kostas Alexis. Task-driven com- pression for collision encoding based on depth images. InInternational Symposium on Visual Computing, pages 259–273. Springer, 2023

  34. [34]

    CAD2RL: Real Single-Image Flight without a Single Real Image

    Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image.arXiv preprint arXiv:1611.04201, 2016

  35. [35]

    Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning

    Linhai Xie, Sen Wang, Andrew Markham, and Niki Trigoni. Towards monocular vision based obstacle avoid- ance through deep reinforcement learning.arXiv preprint arXiv:1706.09829, 2017

  36. [36]

    Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning

    Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P How. Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 285–292. IEEE, 2017

  37. [37]

    Socially aware motion planning with deep re- inforcement learning

    Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep re- inforcement learning. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

  38. [38]

    A vision based deep reinforcement learning algorithm for uav obstacle avoidance

    Jeremy Roghair, Amir Niaraki, Kyungtae Ko, and Ali Jannesari. A vision based deep reinforcement learning algorithm for uav obstacle avoidance. InProceedings of SAI Intelligent Systems Conference, pages 115–128. Springer, 2021

  39. [39]

    Abhik Singla, Sindhu Padakandla, and Shalabh Bhatna- gar. Memory-based deep reinforcement learning for ob- stacle avoidance in uav with limited environment knowl- edge.IEEE transactions on intelligent transportation systems, 22(1):107–118, 2019

  40. [40]

    A vision- based irregular obstacle avoidance framework via deep reinforcement learning

    Lingping Gao, Jianchuan Ding, Wenxi Liu, Haiyin Piao, Yuxin Wang, Xin Yang, and Baocai Yin. A vision- based irregular obstacle avoidance framework via deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9262–9269. IEEE, 2021

  41. [41]

    Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

    Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

  42. [42]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  43. [43]

    Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

    Zhanteng Xie and Philip Dames. Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

  44. [44]

    End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

    Zifan Wang, Xun Yang, Jianzhuang Zhao, Jiaming Zhou, Teli Ma, Ziyao Gao, Arash Ajoudani, and Junwei Liang. End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

  45. [45]

    Qihan Qi, Hai Lin, Xinsong Yang, Yaping Sun, Xingxing Ju, and Wenwu Yu. Distributed collision-free control of mass by combining reinforcement learning with fil- tered position barrier certificates and applications.IEEE Transactions on Automation Science and Engineering, 22:23849–23860, 2025

  46. [46]

    Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

    Zifan Wang, Teli Ma, Yufei Jia, Xun Yang, Jiaming Zhou, Wenlong Ouyang, Qiang Zhang, and Junwei Liang. Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

  47. [47]

    Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

    Bowen Xu, Zexuan Yan, Minghao Lu Member, Xiyu Fan, Yi Luo, Youshen Lin, Zhiqiang Chen, Yeke Chen, 18 Qiyuan Qiao, and Peng Lu. Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

  48. [48]

    Vision-guided collision avoidance through deep reinforcement learning

    Sirui Song, Yuanhang Zhang, Xi Qin, Kirk Saunders, and Jundong Liu. Vision-guided collision avoidance through deep reinforcement learning. InNAECON 2021-IEEE National Aerospace and Electronics Conference, pages 191–194. IEEE, 2021

  49. [49]

    Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

    Zhihan Xue and Tad Gonsalves. Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

  50. [50]

    Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

    Minwoo Kim, Jongyun Kim, Minjae Jung, and Hyon- dong Oh. Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

  51. [51]

    Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning

    Raffaele Brilli, Marco Legittimo, Francesco Crocetti, Mirko Leomanni, Mario Luca Fravolini, and Gabriele Costante. Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 12535–12541. IEEE, 2023

  52. [52]

    Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

    Maria Krinner, Angel Romero, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron, and Davide Scara- muzza. Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

  53. [53]

    Actor-critic model predictive control

    Angel Romero, Yunlong Song, and Davide Scaramuzza. Actor-critic model predictive control. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14777–14784. IEEE, 2024

  54. [54]

    A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

    Yongkui Liu, He Xu, Ding Liu, and Lihui Wang. A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

  55. [55]

    TRANSIC: Sim-to-real policy transfer by learning from online correction

    Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. TRANSIC: Sim-to-real policy transfer by learning from online correction. In8th Annual Conference on Robot Learning, 2024. URL https:// openreview.net/forum?id=lpjPft4RQT

  56. [56]

    Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

    Paul Maria Scheikl, Eleonora Tagliabue, Bal ´azs Gyenes, Martin Wagner, Diego Dall’Alba, Paolo Fiorini, and Franziska Mathis-Ullrich. Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

  57. [57]

    Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

    Jingda Wu, Yanxin Zhou, Haohan Yang, Zhiyu Huang, and Chen Lv. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

  58. [58]

    Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

    Michael Everett, Yu Fan Chen, and Jonathan P How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

  59. [59]

    Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

    Yunlong Song, Kexin Shi, Robert Penicka, and Da- vide Scaramuzza. Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

  60. [60]

    Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

    David Hoeller, Lorenz Wellhausen, Farbod Farshidian, and Marco Hutter. Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

  61. [61]

    Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight

    Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chun- wei Xing, and Davide Scaramuzza. Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5330–5337. IEEE, 2024

  62. [62]

    Dropo: Sim-to-real transfer with offline domain randomization

    Gabriele Tiboni, Karol Arndt, and Ville Kyrki. Dropo: Sim-to-real transfer with offline domain randomization. Robotics and Autonomous Systems, 166:104432, 2023

  63. [63]

    Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018

    Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018. doi: 10.1109/TRO.2018.2853729

  64. [64]

    Fast-lio2: Fast direct lidar-inertial odometry

    Wei Xu, Yixi Cai, Dongjiao He, Jiarong Lin, and Fu Zhang. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Transactions on Robotics, 38(4):2053–2073, 2022. doi: 10.1109/TRO.2022.3141876

  65. [65]

    Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution

    Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017

  66. [66]

    High-Dimensional Continuous Control Using Generalized Advantage Estimation

    John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation.arXiv preprint arXiv:1506.02438, 2015

  67. [67]

    Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments

    Jiahao Lin, Hai Zhu, and Javier Alonso-Mora. Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2682–2688. IEEE, 2020

  68. [68]

    Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

    Zhefan Xu, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, and Kenji Shimada. Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

  69. [69]

    Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

    Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

  70. [70]

    ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

    Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, and Kenji Shimada. Vision-aided uav navigation and dynamic obstacle avoidance using gradient-based b- spline trajectory optimization. In2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 1214–1220, 2023. doi: 10.1109/ICRA48891.2023. 10160638

  71. [71]

    Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021

    Zhefan Xu, Di Deng, and Kenji Shimada. Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021. doi: 10.1109/ LRA.2021.3062008