NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation

Hanyu Jin; Kenji Shimada; Zhefan Xu

arxiv: 2605.15559 · v1 · pith:SL3GKCX4new · submitted 2026-05-15 · 💻 cs.RO

NavRL++: A System-Level Framework for Improving Sim-to-Real Transfer in Reinforcement Learning-Based Robot Navigation

Zhefan Xu , Hanyu Jin , Kenji Shimada This is my paper

Pith reviewed 2026-05-20 19:11 UTC · model grok-4.3

classification 💻 cs.RO

keywords sim-to-real transferreinforcement learningrobot navigationperturbation-aware fine-tuningTransformer policyzero-shot transferaerial robotslegged robots

0 comments

The pith

Perturbation-aware fine-tuning and a Transformer policy let RL navigation transfer zero-shot from sim to real robots by modeling noise, latency, and control gaps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a full training and deployment pipeline for reinforcement learning robot navigation that goes beyond typical framework tweaks to focus on sim-to-real transfer. Through experiments it isolates how sensor noise, perception failures, system latency, and control response degrade performance, then introduces perturbation-aware fine-tuning to adapt policies after initial training. A Transformer-based policy that reasons over short recent observations further reduces perception issues and smooths control. These changes produce policies that beat standard learning methods in both static and dynamic simulated settings, match classical planners in static cases, and deploy successfully on real aerial and legged robots for exploration and inspection without any additional real-world updates. A reader would care because the work supplies practical, measurable steps to close the simulation-to-reality gap that has long limited RL in physical robots.

Core claim

The paper claims that a system-level framework built around systematic identification of four domain discrepancies, followed by perturbation-aware fine-tuning to account for them and a Transformer-based temporal reasoning policy using short-horizon observations, enables reinforcement learning navigation policies to achieve effective zero-shot sim-to-real transfer, outperforming learning-based baselines across environments and reaching performance levels comparable to optimization-based planners in static settings.

What carries the argument

Perturbation-aware fine-tuning, a post-training adaptation step that explicitly incorporates the four identified sim-to-real discrepancies, together with a Transformer-based policy that processes short sequences of recent observations to improve temporal reasoning and control smoothness.

If this is right

Policies trained in simulation can be deployed directly on physical robots across aerial and legged platforms for navigation tasks.
The method outperforms other learning-based navigation approaches in both static and dynamic simulated environments.
In static environments the learned policies reach performance levels comparable to optimization-based planners without requiring online optimization.
The combination of fine-tuning and temporal policy reduces the impact of real-world sensor and control imperfections during deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same discrepancy-modeling approach could be applied to other robotic control problems that currently suffer from sim-to-real gaps.
If the fine-tuning step proves cheap to run, it might reduce reliance on large-scale real-world data collection for RL robotics.
Extending the short-horizon Transformer reasoning to longer sequences could support more complex multi-step navigation behaviors.

Load-bearing premise

That the four listed discrepancies are the dominant factors to model and that explicitly fine-tuning for them after training will produce policies that generalize to new real-world settings and tasks without further adaptation.

What would settle it

A real-world test on an unseen robot platform or environment where navigation performance drops sharply after applying the described fine-tuning and policy would show the transfer is not as robust as claimed.

Figures

Figures reproduced from arXiv: 2605.15559 by Hanyu Jin, Kenji Shimada, Zhefan Xu.

**Figure 1.** Figure 1: Overview of NavRL++ real-world deployment. The proposed system supports zero-shot deployment across multiple environments and robotic platforms, including quadruped and aerial robots. The results highlight robust navigation in dynamic, cluttered, and task-oriented scenarios such as pedestrian collision avoidance, surface inspection, and unknown exploration, demonstrating deployment across multiple robotic … view at source ↗

**Figure 2.** Figure 2: Illustration of the robotic platforms used to deploy our RL navigation [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed reinforcement learning–based autonomous navigation framework. The system supports camera-only, LiDAR-only, and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of static obstacle and action representation. (a) Occupancy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the proposed training strategy. (a) The policy is [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of the perception module. (a) Raw sensor data, including [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Overall performance benchmarking among baseline RL approaches. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of simulation evaluation with 100 (out of 10,000) [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Effect of individual perturbations and their combinations on success rate and control effort of a policy trained in clean simulation. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Navigation success rate evaluation at each curriculum training stage. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of navigation success rates between policies trained with [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of training success rates under varying training scales. [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 14.** Figure 14: Navigation in Isaac Sim. Static obstacles are visualized as red and [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗

**Figure 15.** Figure 15: Gazebo navigation experiments with environment layouts and sample trajectories. The first column presents two randomly generated environments: [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Navigation experiments in Gazebo simulated real-world scenarios. [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗

**Figure 17.** Figure 17: Real-world navigation experiments on both aerial and legged robotic platforms. (a) Navigation in cluttered environments using a UAV, with the [PITH_FULL_IMAGE:figures/full_fig_p015_17.png] view at source ↗

**Figure 18.** Figure 18: Navigation-centric applications using the proposed framework. In both exploration and inspection tasks, the task planner generates global waypoints [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗

read the original abstract

Recent years have witnessed significant progress in autonomous navigation using reinforcement learning. However, existing approaches largely emphasize reinforcement learning framework design, such as input representations, action spaces, and reward functions, while providing limited analysis of sim-to-real transfer and insufficient insight into how training strategies affect real-world deployment performance. To bridge this gap, we not only introduce an effective RL framework but also present a complete training and deployment pipeline, along with a systematic empirical study that disentangles the key factors affecting sim-to-real transfer in reinforcement learning-based navigation, including sensor noise, perception failures, system latency, and control response. Building on insights from this analysis, we introduce perturbation-aware fine-tuning, a post-training adaptation strategy that improves transfer robustness by explicitly accounting for empirically identified domain discrepancies. To further mitigate perception degradation and enhance control smoothness in real-world deployment, we propose a Transformer-based temporal reasoning policy that leverages short-horizon observation for navigation control. We quantitatively evaluate how individual sim-to-real perturbations and training design choices impact navigation performance across environments. Experimental results demonstrate that the proposed training strategy and policy architecture outperform learning-based baselines in both static and dynamic environments, while achieving performance comparable to optimization-based planners in static settings. We validate our approach through real-world deployment on multiple robotic platforms, including aerial and legged robots, across navigation-centric tasks such as exploration and inspection, demonstrating zero-shot sim-to-real transfer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a useful empirical breakdown of four sim-to-real factors in RL navigation plus a practical fine-tuning step, but the zero-shot transfer claims rest on untested assumptions about how completely those factors cover real gaps.

read the letter

This paper's main point is that running a controlled study on sensor noise, perception failures, latency, and control response lets you do targeted post-training fine-tuning that improves zero-shot transfer for RL navigation policies. They add a Transformer policy to handle short-horizon observations and test the whole pipeline on aerial and legged robots for exploration and inspection tasks. The results show gains over other learned methods and parity with classical planners in static cases, with real deployments that work without extra adaptation in their trials.

Referee Report

2 major / 2 minor

Summary. The manuscript presents NavRL++, a system-level framework for RL-based robot navigation. It includes a complete training and deployment pipeline, a systematic empirical study disentangling sim-to-real factors (sensor noise, perception failures, system latency, control response), perturbation-aware fine-tuning as a post-training adaptation strategy, and a Transformer-based temporal reasoning policy using short-horizon observations. The central claims are that the approach outperforms learning-based baselines in static and dynamic environments, matches optimization-based planners in static settings, and achieves zero-shot sim-to-real transfer validated on aerial and legged robots for exploration and inspection tasks.

Significance. If the performance claims and zero-shot transfer results hold under rigorous controls, the work would offer practical value to the robotics community by providing a systematic dissection of domain discrepancies and a targeted fine-tuning method that reduces reliance on real-world adaptation. The multi-platform real-world validation and comparison to both learning and optimization baselines are positive elements that could inform future sim-to-real pipelines in navigation.

major comments (2)

[Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.
[Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.

minor comments (2)

[Policy Architecture] Notation for the Transformer policy's short-horizon observation window and temporal reasoning components could be clarified with explicit equations or pseudocode to improve reproducibility.
[Conclusion] The manuscript would benefit from a dedicated limitations paragraph discussing the scope of environments and tasks tested, to temper the generalization claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to improve experimental transparency and add supporting analyses where needed.

read point-by-point responses

Referee: [Experimental Results / Evaluation sections] The abstract and described experimental pipeline report quantitative improvements and successful zero-shot deployments but provide no details on the number of trials, statistical significance testing, variance across runs, or precise distributions used to model the four discrepancies (sensor noise, perception failures, system latency, control response). This information is load-bearing for the central claim that perturbation-aware fine-tuning produces robust policies, as the reader's strongest claim and skeptic note both hinge on whether these factors are dominant and accurately matched to real statistics.

Authors: We agree that explicit reporting of trial counts, variance, statistical tests, and precise perturbation distributions is necessary to support the robustness claims. The current manuscript does not include these details at the requested level of specificity. In the revision we will expand the Experimental Results and Setup sections to report the number of independent trials conducted for each condition, include mean performance with standard deviation across runs, add statistical significance testing (e.g., t-tests with p-values for key comparisons), and specify the exact distributions and parameters used for each of the four discrepancies, calibrated from real-robot measurements. These additions will be placed in the main text and supplementary material. revision: yes
Referee: [Perturbation-Aware Fine-Tuning subsection] The section introducing perturbation-aware fine-tuning does not include ablation or sensitivity analysis showing that performance remains stable when any of the four modeled discrepancies is deliberately misspecified or when additional unmodeled effects (e.g., actuator nonlinearities or lighting variation) are present. Without such evidence, the sufficiency assumption for zero-shot transfer cannot be fully assessed from the reported results.

Authors: We acknowledge that the current presentation lacks explicit sensitivity analysis to parameter misspecification and unmodeled effects. While the empirical study already isolates the contribution of each modeled factor, we agree this does not fully address robustness to imperfect modeling. In the revision we will add a dedicated sensitivity analysis subsection (or appendix) that reports policy performance under deliberate ±20-30% perturbations to each discrepancy parameter and under additional unmodeled effects such as actuator nonlinearities and lighting changes. These results will be used to qualify the zero-shot transfer claims. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical measurements and deployments without self-referential derivations or fitted predictions

full rationale

The paper describes an RL navigation framework, perturbation-aware fine-tuning, and a Transformer policy, supported by quantitative experiments and real-world deployments on aerial/legged robots. No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or ansatzes from the same work. The four domain discrepancies are identified via analysis and then explicitly modeled, but this is an empirical design choice rather than a definitional loop. Central performance claims are validated externally via physical tests, satisfying independence criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work implicitly relies on standard RL assumptions about reward design and environment modeling plus the unstated premise that the listed perturbations capture the main transfer gap.

pith-pipeline@v0.9.0 · 5786 in / 1190 out tokens · 47312 ms · 2026-05-20T19:11:35.807662+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

perturbation-aware fine-tuning... explicitly accounting for empirically identified domain discrepancies... sensor noise, perception failures, system latency, and control response
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Transformer-based temporal reasoning policy... short-horizon observation histories

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 6 internal anchors

[1]

Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

Ziran Li, Yanwen Zhang, Hao Wu, Satoshi Suzuki, Akio Namiki, and Wei Wang. Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

work page 2023
[2]

Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

Won Joon Yun, Soohyun Park, Joongheon Kim, Myung- Jae Shin, Soyi Jung, David A Mohaisen, and Jae-Hyun Kim. Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

work page 2022
[3]

Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

Boyu Zhou, Hao Xu, and Shaojie Shen. Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

work page 2023
[4]

Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019

Boyu Zhou, Fei Gao, Luqi Wang, Chuhao Liu, and Shaojie Shen. Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019. doi: 10.1109/LRA.2019.2927938

work page doi:10.1109/lra.2019.2927938 2019
[5]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Xin Zhou, Zhepei Wang, Hongkai Ye, Chao Xu, and Fei Gao. Ego-planner: An esdf-free gradient-based local planner for quadrotors.IEEE Robotics and Automation Letters, 6(2):478–485, 2021. doi: 10.1109/LRA.2020. 3047728

work page doi:10.1109/lra.2020 2021
[6]

Far planner: Fast, attemptable route planner using dynamic visibility update

Fan Yang, Chao Cao, Hongbiao Zhu, Jean Oh, and Ji Zhang. Far planner: Fast, attemptable route planner using dynamic visibility update. In2022 ieee/rsj interna- tional conference on intelligent robots and systems (iros), pages 9–16. IEEE, 2022

work page 2022
[7]

Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

Minghao Lu, Xiyu Fan, Han Chen, and Peng Lu. Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

work page 2024
[8]

Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

work page 1999
[9]

Benchmarking deep reinforcement learn- ing for continuous control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learn- ing for continuous control. InInternational conference on machine learning, pages 1329–1338. PMLR, 2016

work page 2016
[10]

Dong Hu, Chao Huang, Jingda Wu, and Xin Yuan. To- ward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy.IEEE Transactions on Intelligent Transportation Systems, 2025

work page 2025
[11]

Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, and Guanya Shi. Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

work page arXiv 2024
[12]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

Antonio Loquercio, Elia Kaufmann, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

work page 2021
[14]

DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

Zhanteng Xie and Philip Dames. DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023. doi: 10.1109/TRO.2023.3257549

work page doi:10.1109/tro.2023.3257549 2023
[15]

You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

Junjie Lu, Xuewei Zhang, Hongming Shen, Liwen Xu, and Bailing Tian. You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

work page 2024
[16]

Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025

Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, and Kenji Shimada. Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025. doi: 10.1109/LRA.2025.3546069

work page doi:10.1109/lra.2025.3546069 2025
[17]

Deep Neural Network for Real-Time Autonomous Indoor Navigation

Dong Ki Kim and Tsuhan Chen. Deep neural network for real-time autonomous indoor navigation.arXiv preprint arXiv:1511.04668, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

A deep-network solution towards model-less obstacle avoidance

Lei Tai, Shaohua Li, and Ming Liu. A deep-network solution towards model-less obstacle avoidance. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2759–2764, 2016. doi: 10. 1109/IROS.2016.7759428

work page arXiv 2016
[19]

Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness

Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4241–4247. IEEE, 2017

work page 2017
[20]

Learn- ing to fly by crashing

Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. Learn- ing to fly by crashing. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3948–3955. IEEE, 2017

work page 2017
[21]

Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

Ram Prasad Padhy, Sachin Verma, Shahzad Ahmad, Suman Kumar Choudhury, and Pankaj Kumar Sa. Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

work page 2018
[22]

Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles

Arbaaz Khan and Martial Hebert. Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles. In2018 IEEE Aerospace Conference, pages 1–9. IEEE, 2018

work page 2018
[23]

A deep learn- ing approach towards autonomous flight in forest envi- ronments

Sinahi Dionisio-Ortega, L Oyuki Rojas-Perez, Jose Martinez-Carranza, and Israel Cruz-Vega. A deep learn- ing approach towards autonomous flight in forest envi- ronments. In2018 International Conference on Electron- ics, Communications and Computers (CONIELECOMP), pages 139–144. IEEE, 2018

work page 2018
[24]

Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

work page 2018
[25]

Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

Varun Tolani, Somil Bansal, Aleksandra Faust, and Claire Tomlin. Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

work page 2021
[26]

ViNT: A foundation model for visual navigation,

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023

work page arXiv 2023
[27]

Nomad: Goal masked diffusion policies for navigation and exploration

Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. Nomad: Goal masked diffusion policies for navigation and exploration. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 63–70. IEEE, 2024

work page 2024
[28]

Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

work page 2025
[29]

Neupan: Direct point robot navigation with end-to-end model-based learning

Ruihua Han, Shuai Wang, Shuaijun Wang, Zeqing Zhang, Jianjun Chen, Shijie Lin, Chengyang Li, Chengzhong Xu, Yonina C Eldar, Qi Hao, et al. Neupan: Direct point robot navigation with end-to-end model-based learning. IEEE Transactions on Robotics, 2025

work page 2025
[30]

Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, and Jiangmiao Pang. Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

work page arXiv 2025
[31]

Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

Sunggoo Jung, Sunyou Hwang, Heemin Shin, and David Hyunchul Shim. Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

work page 2018
[32]

Mononav: Mav navigation via monocular depth estimation and reconstruction

Nathaniel Simon and Anirudha Majumdar. Mononav: Mav navigation via monocular depth estimation and reconstruction. InInternational Symposium on Exper- imental Robotics, pages 415–426. Springer, 2023

work page 2023
[33]

Task-driven com- pression for collision encoding based on depth images

Mihir Kulkarni and Kostas Alexis. Task-driven com- pression for collision encoding based on depth images. InInternational Symposium on Visual Computing, pages 259–273. Springer, 2023

work page 2023
[34]

CAD2RL: Real Single-Image Flight without a Single Real Image

Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image.arXiv preprint arXiv:1611.04201, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning

Linhai Xie, Sen Wang, Andrew Markham, and Niki Trigoni. Towards monocular vision based obstacle avoid- ance through deep reinforcement learning.arXiv preprint arXiv:1706.09829, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning

Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P How. Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 285–292. IEEE, 2017

work page 2017
[37]

Socially aware motion planning with deep re- inforcement learning

Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep re- inforcement learning. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

work page 2017
[38]

A vision based deep reinforcement learning algorithm for uav obstacle avoidance

Jeremy Roghair, Amir Niaraki, Kyungtae Ko, and Ali Jannesari. A vision based deep reinforcement learning algorithm for uav obstacle avoidance. InProceedings of SAI Intelligent Systems Conference, pages 115–128. Springer, 2021

work page 2021
[39]

Abhik Singla, Sindhu Padakandla, and Shalabh Bhatna- gar. Memory-based deep reinforcement learning for ob- stacle avoidance in uav with limited environment knowl- edge.IEEE transactions on intelligent transportation systems, 22(1):107–118, 2019

work page 2019
[40]

A vision- based irregular obstacle avoidance framework via deep reinforcement learning

Lingping Gao, Jianchuan Ding, Wenxi Liu, Haiyin Piao, Yuxin Wang, Xin Yang, and Baocai Yin. A vision- based irregular obstacle avoidance framework via deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9262–9269. IEEE, 2021

work page 2021
[41]

Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

work page 2023
[42]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

Zhanteng Xie and Philip Dames. Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

work page 2023
[44]

End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

Zifan Wang, Xun Yang, Jianzhuang Zhao, Jiaming Zhou, Teli Ma, Ziyao Gao, Arash Ajoudani, and Junwei Liang. End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

work page arXiv 2025
[45]

Qihan Qi, Hai Lin, Xinsong Yang, Yaping Sun, Xingxing Ju, and Wenwu Yu. Distributed collision-free control of mass by combining reinforcement learning with fil- tered position barrier certificates and applications.IEEE Transactions on Automation Science and Engineering, 22:23849–23860, 2025

work page 2025
[46]

Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

Zifan Wang, Teli Ma, Yufei Jia, Xun Yang, Jiaming Zhou, Wenlong Ouyang, Qiang Zhang, and Junwei Liang. Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

work page arXiv 2025
[47]

Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

Bowen Xu, Zexuan Yan, Minghao Lu Member, Xiyu Fan, Yi Luo, Youshen Lin, Zhiqiang Chen, Yeke Chen, 18 Qiyuan Qiao, and Peng Lu. Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

work page 2025
[48]

Vision-guided collision avoidance through deep reinforcement learning

Sirui Song, Yuanhang Zhang, Xi Qin, Kirk Saunders, and Jundong Liu. Vision-guided collision avoidance through deep reinforcement learning. InNAECON 2021-IEEE National Aerospace and Electronics Conference, pages 191–194. IEEE, 2021

work page 2021
[49]

Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

Zhihan Xue and Tad Gonsalves. Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

work page 2021
[50]

Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

Minwoo Kim, Jongyun Kim, Minjae Jung, and Hyon- dong Oh. Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

work page 2022
[51]

Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning

Raffaele Brilli, Marco Legittimo, Francesco Crocetti, Mirko Leomanni, Mario Luca Fravolini, and Gabriele Costante. Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 12535–12541. IEEE, 2023

work page 2023
[52]

Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

Maria Krinner, Angel Romero, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron, and Davide Scara- muzza. Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

work page arXiv 2024
[53]

Actor-critic model predictive control

Angel Romero, Yunlong Song, and Davide Scaramuzza. Actor-critic model predictive control. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14777–14784. IEEE, 2024

work page 2024
[54]

A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

Yongkui Liu, He Xu, Ding Liu, and Lihui Wang. A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

work page 2022
[55]

TRANSIC: Sim-to-real policy transfer by learning from online correction

Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. TRANSIC: Sim-to-real policy transfer by learning from online correction. In8th Annual Conference on Robot Learning, 2024. URL https:// openreview.net/forum?id=lpjPft4RQT

work page 2024
[56]

Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

Paul Maria Scheikl, Eleonora Tagliabue, Bal ´azs Gyenes, Martin Wagner, Diego Dall’Alba, Paolo Fiorini, and Franziska Mathis-Ullrich. Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

work page 2022
[57]

Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

Jingda Wu, Yanxin Zhou, Haohan Yang, Zhiyu Huang, and Chen Lv. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

work page 2023
[58]

Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

Michael Everett, Yu Fan Chen, and Jonathan P How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

work page 2021
[59]

Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

Yunlong Song, Kexin Shi, Robert Penicka, and Da- vide Scaramuzza. Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

work page arXiv 2022
[60]

Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

David Hoeller, Lorenz Wellhausen, Farbod Farshidian, and Marco Hutter. Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

work page 2021
[61]

Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight

Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chun- wei Xing, and Davide Scaramuzza. Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5330–5337. IEEE, 2024

work page 2024
[62]

Dropo: Sim-to-real transfer with offline domain randomization

Gabriele Tiboni, Karol Arndt, and Ville Kyrki. Dropo: Sim-to-real transfer with offline domain randomization. Robotics and Autonomous Systems, 166:104432, 2023

work page 2023
[63]

Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018

Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018. doi: 10.1109/TRO.2018.2853729

work page doi:10.1109/tro.2018.2853729 2018
[64]

Fast-lio2: Fast direct lidar-inertial odometry

Wei Xu, Yixi Cai, Dongjiao He, Jiarong Lin, and Fu Zhang. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Transactions on Robotics, 38(4):2053–2073, 2022. doi: 10.1109/TRO.2022.3141876

work page doi:10.1109/tro.2022.3141876 2053
[65]

Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution

Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017

work page 2017
[66]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation.arXiv preprint arXiv:1506.02438, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[67]

Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments

Jiahao Lin, Hai Zhu, and Javier Alonso-Mora. Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2682–2688. IEEE, 2020

work page 2020
[68]

Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

Zhefan Xu, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, and Kenji Shimada. Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

work page 2023
[69]

Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

work page 1998
[70]

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, and Kenji Shimada. Vision-aided uav navigation and dynamic obstacle avoidance using gradient-based b- spline trajectory optimization. In2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 1214–1220, 2023. doi: 10.1109/ICRA48891.2023. 10160638

work page doi:10.1109/icra48891.2023 2023
[71]

Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021

Zhefan Xu, Di Deng, and Kenji Shimada. Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021. doi: 10.1109/ LRA.2021.3062008

work page arXiv 2021

[1] [1]

Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

Ziran Li, Yanwen Zhang, Hao Wu, Satoshi Suzuki, Akio Namiki, and Wei Wang. Design and application of a uav autonomous inspection system for high-voltage power transmission lines.Remote Sensing, 15(3):865, 2023

work page 2023

[2] [2]

Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

Won Joon Yun, Soohyun Park, Joongheon Kim, Myung- Jae Shin, Soyi Jung, David A Mohaisen, and Jae-Hyun Kim. Cooperative multiagent deep reinforcement learn- ing for reliable surveillance via autonomous multi-uav control.IEEE Transactions on Industrial Informatics, 18 (10):7086–7096, 2022

work page 2022

[3] [3]

Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

Boyu Zhou, Hao Xu, and Shaojie Shen. Racer: Rapid collaborative exploration with a decentralized multi-uav system.IEEE Transactions on Robotics, 39(3):1816– 1835, 2023

work page 2023

[4] [4]

Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019

Boyu Zhou, Fei Gao, Luqi Wang, Chuhao Liu, and Shaojie Shen. Robust and efficient quadrotor trajectory generation for fast autonomous flight.IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019. doi: 10.1109/LRA.2019.2927938

work page doi:10.1109/lra.2019.2927938 2019

[5] [5]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Xin Zhou, Zhepei Wang, Hongkai Ye, Chao Xu, and Fei Gao. Ego-planner: An esdf-free gradient-based local planner for quadrotors.IEEE Robotics and Automation Letters, 6(2):478–485, 2021. doi: 10.1109/LRA.2020. 3047728

work page doi:10.1109/lra.2020 2021

[6] [6]

Far planner: Fast, attemptable route planner using dynamic visibility update

Fan Yang, Chao Cao, Hongbiao Zhu, Jean Oh, and Ji Zhang. Far planner: Fast, attemptable route planner using dynamic visibility update. In2022 ieee/rsj interna- tional conference on intelligent robots and systems (iros), pages 9–16. IEEE, 2022

work page 2022

[7] [7]

Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

Minghao Lu, Xiyu Fan, Han Chen, and Peng Lu. Fapp: Fast and adaptive perception and planning for uavs in dynamic cluttered environments.IEEE Transactions on Robotics, 2024

work page 2024

[8] [8]

Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforce- ment learning with function approximation.Advances in neural information processing systems, 12, 1999

work page 1999

[9] [9]

Benchmarking deep reinforcement learn- ing for continuous control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learn- ing for continuous control. InInternational conference on machine learning, pages 1329–1338. PMLR, 2016

work page 2016

[10] [10]

Dong Hu, Chao Huang, Jingda Wu, and Xin Yuan. To- ward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy.IEEE Transactions on Intelligent Transportation Systems, 2025

work page 2025

[11] [11]

Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, and Guanya Shi. Agile but safe: Learn- ing collision-free high-speed legged locomotion.arXiv preprint arXiv:2401.17583, 2024

work page arXiv 2024

[12] [12]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

Antonio Loquercio, Elia Kaufmann, Ren ´e Ranftl, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Learning high-speed flight in the wild.Science Robotics, 6(59):eabg5810, 2021

work page 2021

[14] [14]

DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

Zhanteng Xie and Philip Dames. DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023. doi: 10.1109/TRO.2023.3257549

work page doi:10.1109/tro.2023.3257549 2023

[15] [15]

You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

Junjie Lu, Xuewei Zhang, Hongming Shen, Liwen Xu, and Bailing Tian. You only plan once: A learning-based one-stage planner with guidance learning.IEEE Robotics and Automation Letters, 9(7):6083–6090, 2024

work page 2024

[16] [16]

Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025

Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, and Kenji Shimada. Navrl: Learning safe flight in dynamic environments.IEEE Robotics and Automation Letters, 10 (4):3668–3675, 2025. doi: 10.1109/LRA.2025.3546069

work page doi:10.1109/lra.2025.3546069 2025

[17] [17]

Deep Neural Network for Real-Time Autonomous Indoor Navigation

Dong Ki Kim and Tsuhan Chen. Deep neural network for real-time autonomous indoor navigation.arXiv preprint arXiv:1511.04668, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

A deep-network solution towards model-less obstacle avoidance

Lei Tai, Shaohua Li, and Ming Liu. A deep-network solution towards model-less obstacle avoidance. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2759–2764, 2016. doi: 10. 1109/IROS.2016.7759428

work page arXiv 2016

[19] [19]

Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness

Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. Toward low-flying autonomous mav trail navigation using deep neural networks for envi- ronmental awareness. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4241–4247. IEEE, 2017

work page 2017

[20] [20]

Learn- ing to fly by crashing

Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. Learn- ing to fly by crashing. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3948–3955. IEEE, 2017

work page 2017

[21] [21]

Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

Ram Prasad Padhy, Sachin Verma, Shahzad Ahmad, Suman Kumar Choudhury, and Pankaj Kumar Sa. Deep neural network for autonomous uav navigation in indoor 17 corridor environments.Procedia computer science, 133: 643–650, 2018

work page 2018

[22] [22]

Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles

Arbaaz Khan and Martial Hebert. Learning safe recovery trajectories with deep neural networks for unmanned aerial vehicles. In2018 IEEE Aerospace Conference, pages 1–9. IEEE, 2018

work page 2018

[23] [23]

A deep learn- ing approach towards autonomous flight in forest envi- ronments

Sinahi Dionisio-Ortega, L Oyuki Rojas-Perez, Jose Martinez-Carranza, and Israel Cruz-Vega. A deep learn- ing approach towards autonomous flight in forest envi- ronments. In2018 International Conference on Electron- ics, Communications and Computers (CONIELECOMP), pages 139–144. IEEE, 2018

work page 2018

[24] [24]

Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

Antonio Loquercio, Ana I Maqueda, Carlos R Del- Blanco, and Davide Scaramuzza. Dronet: Learning to fly by driving.IEEE Robotics and Automation Letters, 3(2):1088–1095, 2018

work page 2018

[25] [25]

Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

Varun Tolani, Somil Bansal, Aleksandra Faust, and Claire Tomlin. Visual navigation among humans with optimal control as a supervisor.IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021

work page 2021

[26] [26]

ViNT: A foundation model for visual navigation,

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023

work page arXiv 2023

[27] [27]

Nomad: Goal masked diffusion policies for navigation and exploration

Ajay Sridhar, Dhruv Shah, Catherine Glossop, and Sergey Levine. Nomad: Goal masked diffusion policies for navigation and exploration. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 63–70. IEEE, 2024

work page 2024

[28] [28]

Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, pages 1–13, 2025

work page 2025

[29] [29]

Neupan: Direct point robot navigation with end-to-end model-based learning

Ruihua Han, Shuai Wang, Shuaijun Wang, Zeqing Zhang, Jianjun Chen, Shijie Lin, Chengyang Li, Chengzhong Xu, Yonina C Eldar, Qi Hao, et al. Neupan: Direct point robot navigation with end-to-end model-based learning. IEEE Transactions on Robotics, 2025

work page 2025

[30] [30]

Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, and Jiangmiao Pang. Logoplanner: Localization grounded navigation policy with metric-aware visual geometry.arXiv preprint arXiv:2512.19629, 2025

work page arXiv 2025

[31] [31]

Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

Sunggoo Jung, Sunyou Hwang, Heemin Shin, and David Hyunchul Shim. Perception, guidance, and nav- igation for indoor autonomous drone racing using deep learning.IEEE Robotics and Automation Letters, 3(3): 2539–2544, 2018

work page 2018

[32] [32]

Mononav: Mav navigation via monocular depth estimation and reconstruction

Nathaniel Simon and Anirudha Majumdar. Mononav: Mav navigation via monocular depth estimation and reconstruction. InInternational Symposium on Exper- imental Robotics, pages 415–426. Springer, 2023

work page 2023

[33] [33]

Task-driven com- pression for collision encoding based on depth images

Mihir Kulkarni and Kostas Alexis. Task-driven com- pression for collision encoding based on depth images. InInternational Symposium on Visual Computing, pages 259–273. Springer, 2023

work page 2023

[34] [34]

CAD2RL: Real Single-Image Flight without a Single Real Image

Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image.arXiv preprint arXiv:1611.04201, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[35] [35]

Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning

Linhai Xie, Sen Wang, Andrew Markham, and Niki Trigoni. Towards monocular vision based obstacle avoid- ance through deep reinforcement learning.arXiv preprint arXiv:1706.09829, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning

Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P How. Decentralized non-communicating multiagent col- lision avoidance with deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 285–292. IEEE, 2017

work page 2017

[37] [37]

Socially aware motion planning with deep re- inforcement learning

Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep re- inforcement learning. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

work page 2017

[38] [38]

A vision based deep reinforcement learning algorithm for uav obstacle avoidance

Jeremy Roghair, Amir Niaraki, Kyungtae Ko, and Ali Jannesari. A vision based deep reinforcement learning algorithm for uav obstacle avoidance. InProceedings of SAI Intelligent Systems Conference, pages 115–128. Springer, 2021

work page 2021

[39] [39]

Abhik Singla, Sindhu Padakandla, and Shalabh Bhatna- gar. Memory-based deep reinforcement learning for ob- stacle avoidance in uav with limited environment knowl- edge.IEEE transactions on intelligent transportation systems, 22(1):107–118, 2019

work page 2019

[40] [40]

A vision- based irregular obstacle avoidance framework via deep reinforcement learning

Lingping Gao, Jianchuan Ding, Wenxi Liu, Haiyin Piao, Yuxin Wang, Xin Yang, and Baocai Yin. A vision- based irregular obstacle avoidance framework via deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9262–9269. IEEE, 2021

work page 2021

[41] [41]

Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scara- muzza. Champion-level drone racing using deep rein- forcement learning.Nature, 620(7976):982–987, 2023

work page 2023

[42] [42]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

Zhanteng Xie and Philip Dames. Drl-vo: Learning to navigate through crowded dynamic scenes using velocity obstacles.IEEE Transactions on Robotics, 39(4):2700– 2719, 2023

work page 2023

[44] [44]

End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

Zifan Wang, Xun Yang, Jianzhuang Zhao, Jiaming Zhou, Teli Ma, Ziyao Gao, Arash Ajoudani, and Junwei Liang. End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

work page arXiv 2025

[45] [45]

Qihan Qi, Hai Lin, Xinsong Yang, Yaping Sun, Xingxing Ju, and Wenwu Yu. Distributed collision-free control of mass by combining reinforcement learning with fil- tered position barrier certificates and applications.IEEE Transactions on Automation Science and Engineering, 22:23849–23860, 2025

work page 2025

[46] [46]

Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

Zifan Wang, Teli Ma, Yufei Jia, Xun Yang, Jiaming Zhou, Wenlong Ouyang, Qiang Zhang, and Junwei Liang. Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025

work page arXiv 2025

[47] [47]

Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

Bowen Xu, Zexuan Yan, Minghao Lu Member, Xiyu Fan, Yi Luo, Youshen Lin, Zhiqiang Chen, Yeke Chen, 18 Qiyuan Qiao, and Peng Lu. Flow-aided flight through dynamic clutters from point to motion.IEEE Robotics and Automation Letters, 11(1):218–225, 2025

work page 2025

[48] [48]

Vision-guided collision avoidance through deep reinforcement learning

Sirui Song, Yuanhang Zhang, Xi Qin, Kirk Saunders, and Jundong Liu. Vision-guided collision avoidance through deep reinforcement learning. InNAECON 2021-IEEE National Aerospace and Electronics Conference, pages 191–194. IEEE, 2021

work page 2021

[49] [49]

Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

Zhihan Xue and Tad Gonsalves. Vision based drone obstacle avoidance by deep reinforcement learning.Ai, 2(3):366–380, 2021

work page 2021

[50] [50]

Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

Minwoo Kim, Jongyun Kim, Minjae Jung, and Hyon- dong Oh. Towards monocular vision-based autonomous flight through deep reinforcement learning.Expert Sys- tems with Applications, 198:116742, 2022

work page 2022

[51] [51]

Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning

Raffaele Brilli, Marco Legittimo, Francesco Crocetti, Mirko Leomanni, Mario Luca Fravolini, and Gabriele Costante. Monocular reactive collision avoidance for mav teleoperation with deep reinforcement learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 12535–12541. IEEE, 2023

work page 2023

[52] [52]

Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

Maria Krinner, Angel Romero, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron, and Davide Scara- muzza. Time-optimal flight with safety constraints and datadriven dynamics.arXiv preprint arXiv:2403.17551, 2024

work page arXiv 2024

[53] [53]

Actor-critic model predictive control

Angel Romero, Yunlong Song, and Davide Scaramuzza. Actor-critic model predictive control. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14777–14784. IEEE, 2024

work page 2024

[54] [54]

A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

Yongkui Liu, He Xu, Ding Liu, and Lihui Wang. A digi- tal twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping.Robotics and Computer-Integrated Manufacturing, 78:102365, 2022

work page 2022

[55] [55]

TRANSIC: Sim-to-real policy transfer by learning from online correction

Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. TRANSIC: Sim-to-real policy transfer by learning from online correction. In8th Annual Conference on Robot Learning, 2024. URL https:// openreview.net/forum?id=lpjPft4RQT

work page 2024

[56] [56]

Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

Paul Maria Scheikl, Eleonora Tagliabue, Bal ´azs Gyenes, Martin Wagner, Diego Dall’Alba, Paolo Fiorini, and Franziska Mathis-Ullrich. Sim-to-real transfer for visual reinforcement learning of deformable object manipu- lation for robot-assisted surgery.IEEE Robotics and Automation Letters, 8(2):560–567, 2022

work page 2022

[57] [57]

Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

Jingda Wu, Yanxin Zhou, Haohan Yang, Zhiyu Huang, and Chen Lv. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(12):14745–14759, 2023

work page 2023

[58] [58]

Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

Michael Everett, Yu Fan Chen, and Jonathan P How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.Ieee Access, 9:10357– 10377, 2021

work page 2021

[59] [59]

Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

Yunlong Song, Kexin Shi, Robert Penicka, and Da- vide Scaramuzza. Learning perception-aware ag- ile flight in cluttered environments.arXiv preprint arXiv:2210.01841, 2022

work page arXiv 2022

[60] [60]

Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

David Hoeller, Lorenz Wellhausen, Farbod Farshidian, and Marco Hutter. Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

work page 2021

[61] [61]

Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight

Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chun- wei Xing, and Davide Scaramuzza. Contrastive learn- ing for enhancing robust scene transfer in vision-based agile flight. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5330–5337. IEEE, 2024

work page 2024

[62] [62]

Dropo: Sim-to-real transfer with offline domain randomization

Gabriele Tiboni, Karol Arndt, and Ville Kyrki. Dropo: Sim-to-real transfer with offline domain randomization. Robotics and Autonomous Systems, 166:104432, 2023

work page 2023

[63] [63]

Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018

Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004– 1020, 2018. doi: 10.1109/TRO.2018.2853729

work page doi:10.1109/tro.2018.2853729 2018

[64] [64]

Fast-lio2: Fast direct lidar-inertial odometry

Wei Xu, Yixi Cai, Dongjiao He, Jiarong Lin, and Fu Zhang. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Transactions on Robotics, 38(4):2053–2073, 2022. doi: 10.1109/TRO.2022.3141876

work page doi:10.1109/tro.2022.3141876 2053

[65] [65]

Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution

Po-Wei Chou, Daniel Maturana, and Sebastian Scherer. Improving stochastic policy gradients in continuous con- trol with deep reinforcement learning using the beta distribution. InInternational conference on machine learning, pages 834–843. PMLR, 2017

work page 2017

[66] [66]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation.arXiv preprint arXiv:1506.02438, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[67] [67]

Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments

Jiahao Lin, Hai Zhu, and Javier Alonso-Mora. Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2682–2688. IEEE, 2020

work page 2020

[68] [68]

Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

Zhefan Xu, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, and Kenji Shimada. Onboard dynamic-object detection and tracking for autonomous robot navigation with rgb-d camera.IEEE Robotics and Automation Letters, 9(1):651–658, 2023

work page 2023

[69] [69]

Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles.The international journal of robotics research, 17(7):760– 772, 1998

work page 1998

[70] [70]

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, and Kenji Shimada. Vision-aided uav navigation and dynamic obstacle avoidance using gradient-based b- spline trajectory optimization. In2023 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 1214–1220, 2023. doi: 10.1109/ICRA48891.2023. 10160638

work page doi:10.1109/icra48891.2023 2023

[71] [71]

Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021

Zhefan Xu, Di Deng, and Kenji Shimada. Autonomous uav exploration of dynamic environments via incremental sampling and probabilistic roadmap.IEEE Robotics and Automation Letters, 6(2):2729–2736, 2021. doi: 10.1109/ LRA.2021.3062008

work page arXiv 2021