PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion

Boyu Zhou; Huaxu Li; Jie Mei; Jinqiang Cui; U Kei Cheang; Youmin Gong; Zihong Lu; Zongzhuo Liu

arxiv: 2606.03441 · v2 · pith:TRXUWKRHnew · submitted 2026-06-02 · 💻 cs.RO · cs.LG

PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion

Zihong Lu , Zongzhuo Liu , Huaxu Li , Jinqiang Cui , Jie Mei , Youmin Gong , U Kei Cheang , Boyu Zhou This is my paper

Pith reviewed 2026-06-28 09:24 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords reinforcement learningvision-based perchingquadrotor controlagile flightmoving inclined platformsroboticsautonomous landing

0 comments

The pith

A two-stage reinforcement learning method enables quadrotors to perch on inclined platforms moving rapidly and irregularly using only vision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Quadrotors struggle to perch on moving inclined surfaces because their cameras have a narrow field of view that frequently loses sight of the target. The paper presents PerchRL, which first pre-trains a policy with complete state data and then fine-tunes it on raw images, adding randomized platform paths during training and temporal augmentation of past observations to capture motion patterns. A visibility-aware augmentation step and rewards that encourage active looking help the policy recover when the platform disappears from view. Simulation and hardware tests show the resulting policies perch successfully in real time and transfer to different quadrotor bodies without retuning.

Core claim

PerchRL shows that a reinforcement learning policy trained in two stages—state-based pre-training followed by vision-based fine-tuning—can achieve stable agile perching on inclined platforms under rapid irregular motion when the training distribution includes randomized trajectories, temporal history augmentation, visibility-aware image augmentation, and active-perception rewards.

What carries the argument

Two-stage RL pipeline that pre-trains on full state then fine-tunes on vision, with randomized platform trajectories, temporal augmentation of observations, visibility-aware state augmentation, and active perception rewards.

If this is right

The learned policies run in real time on physical quadrotors.
The same policy transfers across distinct quadrotor platforms without retuning.
Successful perching occurs under both simulated and real rapid irregular platform motion.
The hybrid visibility and active-perception components maintain performance during intermittent visual loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same training recipe could be applied to perching on other surface orientations if the randomization range is expanded accordingly.
Temporal augmentation may allow the policy to anticipate short-term platform motion even without an explicit predictor.
Cross-platform success suggests the policy has captured platform-agnostic dynamics rather than hardware-specific parameters.

Load-bearing premise

Randomized platform trajectories during training plus temporal augmentation will produce policies that generalize to the distribution of real-world irregular motions without requiring platform-specific tuning.

What would settle it

Real-world trials in which the platform follows an irregular trajectory outside the randomized training distribution produce repeated perching failures or loss of stability.

Figures

Figures reproduced from arXiv: 2606.03441 by Boyu Zhou, Huaxu Li, Jie Mei, Jinqiang Cui, U Kei Cheang, Youmin Gong, Zihong Lu, Zongzhuo Liu.

**Figure 2.** Figure 2: System overview of PerchRL, including the training pipeline and deployment architecture. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Baseline and ablation studies of state-based perching under diverse platform motion pat [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The training curves of success rate and average [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the Normal-I experiment. (a) Snapshot of the whole process. The quadrotor perches on the platform, which follows a spline trajectory at approximately 2.0 m/s with a 70◦ inclination, and the inset shows the key hardware components of our custom-built quadrotor. (b) 3D trajectories of the quadrotor and platform. (c)(d) Velocity and pitch angle curves of the quadrotor and platform. Normal-I E… view at source ↗

**Figure 6.** Figure 6: Performance comparison between the state-based and vision-based policies. The heatmaps [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Real-world demonstration of PerchRL for state-based perching on a [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Hardware platforms for the vision-based perching experiments. (a) An illustration of the [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Snapshots across all experimental scenarios. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Autonomous vision-based perching of quadrotors on moving inclined platforms is critical for air-ground collaboration but remains challenging due to the limited field of view (FOV). In this paper, we propose PerchRL, a reinforcement learning (RL) framework for vision-based agile perching on inclined platforms under rapid and irregular motion. Specifically, we employ a two-stage learning strategy consisting of state-based pre-training followed by vision-based fine-tuning. To improve generalization across diverse platform motions, we employ randomized platform trajectories to prevent overfitting and temporal augmentation methods to capture latent motion patterns from historical observations. During vision-based fine-tuning, a hybrid learning framework consisting of visibility-aware state augmentation and active perception rewards is presented to improve robustness under intermittent visual loss. Extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance of PerchRL, while successful deployment across distinct quadrotor platforms further validates its adaptability. The source code will be released to benefit the community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two-stage RL pipeline with visibility tricks for perching on moving platforms, but abstract gives no metrics or comparisons to support the real-world claims.

read the letter

The paper introduces PerchRL as a two-stage RL method: state-based pre-training on randomized platform trajectories, followed by vision-based fine-tuning that adds visibility-aware state augmentation and active perception rewards to cope with intermittent views on inclined moving targets. Temporal augmentation is used to pick up motion patterns from history. This specific combination for agile perching under rapid irregular motion is presented as the contribution.

It addresses a concrete robotics need for air-ground collaboration and plans to release code, which is useful. The hybrid framework for handling vision loss during fine-tuning is a reasonable engineering step beyond plain RL.

The soft spot is the evaluation. The abstract asserts extensive simulation and real-world experiments plus cross-platform deployment, yet supplies no success rates, baselines, error bars, or timing numbers. The stress-test concern holds here: there is no reported check (spectra, histograms, or divergence) that the randomized training motions match the real irregular ones, so the generalization and no-tuning claims rest on unshown evidence. No load-bearing equations or fitted parameters appear that would make results circular.

This is for researchers working on vision-based drone control and RL for dynamic landing tasks. A reader could extract the pipeline structure and reward ideas even without the numbers. It deserves peer review because the problem is practical and the methods are described at a level that referees can assess, though the paper will need added quantitative results and ablations to stand up.

Referee Report

3 major / 2 minor

Summary. The paper proposes PerchRL, a two-stage RL framework for vision-based agile perching of quadrotors on inclined platforms undergoing rapid and irregular motion. It consists of state-based pre-training on randomized trajectories with temporal augmentation to improve generalization, followed by vision-based fine-tuning using a hybrid framework with visibility-aware state augmentation and active perception rewards to handle intermittent visual loss. The central claim is that extensive simulation and real-world experiments demonstrate feasibility, stability, real-time performance, and adaptability across distinct quadrotor platforms without platform-specific tuning.

Significance. If the experimental results hold with proper quantitative support, the work would contribute to agile aerial robotics by offering a generalizable approach to vision-based perching under challenging motion conditions, with potential applications in air-ground collaboration. The planned release of source code would support reproducibility in the field.

major comments (3)

[Abstract, §4] Abstract and §4 (Experiments): The central claim that 'extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance' is unsupported, as the manuscript provides no quantitative metrics, success rates, baseline comparisons, failure rates, or error bars to substantiate performance or generalization.
[§3.2, §3.3] §3.2 (two-stage pipeline) and §3.3 (hybrid vision fine-tuning): The adaptability claim without platform-specific tuning rests on randomized trajectories plus temporal augmentation producing policies robust to real irregular motions, but no distributional validation (e.g., power spectra, acceleration histograms, or KL divergence between training and test motions) or ablations isolating their effect on cross-platform success is reported.
[§4] §4 (real-world deployment): The assertion of successful deployment across distinct quadrotor platforms validating adaptability lacks any reported quantitative cross-platform metrics or controls for test diversity, leaving open whether results reflect the claimed generalization or limited test conditions.

minor comments (2)

[§3] Notation for the temporal augmentation and active perception reward terms is introduced without explicit equations or parameter definitions, reducing clarity for readers attempting to reproduce the method.
[Abstract] The abstract states source code will be released, but no link or repository is provided in the manuscript.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened with additional quantitative support. We address each major comment below and will revise the manuscript to incorporate the suggested analyses and metrics.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (Experiments): The central claim that 'extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance' is unsupported, as the manuscript provides no quantitative metrics, success rates, baseline comparisons, failure rates, or error bars to substantiate performance or generalization.

Authors: We agree that the abstract and experiments section would benefit from explicit quantitative metrics. In the revised manuscript, we will add tables reporting success rates, failure rates, baseline comparisons, and error bars from repeated trials to substantiate the claims of feasibility, stability, and real-time performance. revision: yes
Referee: [§3.2, §3.3] §3.2 (two-stage pipeline) and §3.3 (hybrid vision fine-tuning): The adaptability claim without platform-specific tuning rests on randomized trajectories plus temporal augmentation producing policies robust to real irregular motions, but no distributional validation (e.g., power spectra, acceleration histograms, or KL divergence between training and test motions) or ablations isolating their effect on cross-platform success is reported.

Authors: The referee is correct that distributional validation and targeted ablations are not currently reported. We will include comparisons of motion distributions (power spectra, acceleration histograms, KL divergence) between training and test sets, as well as ablation studies isolating the impact of randomized trajectories and temporal augmentation on cross-platform generalization. revision: yes
Referee: [§4] §4 (real-world deployment): The assertion of successful deployment across distinct quadrotor platforms validating adaptability lacks any reported quantitative cross-platform metrics or controls for test diversity, leaving open whether results reflect the claimed generalization or limited test conditions.

Authors: We acknowledge the need for quantitative cross-platform metrics. The revision will report specific success rates, performance metrics, and controls for test diversity across the distinct quadrotor platforms to better support the adaptability claim. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed results or derivation chain

full rationale

The paper describes an empirical RL pipeline (two-stage training with randomized trajectories and temporal augmentation, followed by hybrid vision fine-tuning) and supports its claims exclusively via simulation and real-world experiments on multiple platforms. No equations, fitted parameters, or self-citations are presented that would make reported success rates or generalization reduce to the training choices by construction. The central claims rest on external experimental outcomes rather than any self-definitional, fitted-input, or uniqueness-imported mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard RL assumptions about reward design and randomization that are not detailed.

pith-pipeline@v0.9.1-grok · 5729 in / 945 out tokens · 19737 ms · 2026-06-28T09:24:37.657902+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 3 canonical work pages · 1 internal anchor

[1]

D. N. Das, R. Sewani, J. Wang, and M. K. Tiwari. Synchronized truck and drone routing in package delivery logistics.IEEE Transactions on Intelligent Transportation Systems, 22(9): 5772–5782, 2020

2020
[2]

G. Wu, N. Mao, Q. Luo, B. Xu, J. Shi, and P. N. Suganthan. Collaborative truck-drone routing for contactless parcel delivery during the epidemic.IEEE Transactions on Intelligent Trans- portation Systems, 23(12):25077–25091, 2022

2022
[3]

Y . Liu, Z. Liu, J. Shi, G. Wu, and W. Pedrycz. Two-echelon routing problem for parcel de- livery by cooperated truck and drone.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(12):7450–7465, 2020

2020
[4]

G. Wu, W. Pedrycz, H. Li, M. Ma, and J. Liu. Coordinated planning of heterogeneous earth observation resources.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(1): 109–125, 2015

2015
[5]

Tokekar, J

P. Tokekar, J. Vander Hook, D. Mulla, and V . Isler. Sensor planning for a symbiotic uav and ugv system for precision agriculture.IEEE transactions on robotics, 32(6):1498–1511, 2016

2016
[6]

Krogius, A

M. Krogius, A. Haggenmiller, and E. Olson. Flexible layouts for fiducial tags. In2019 ieee/rsj international conference on intelligent robots and systems (iros), pages 1898–1903. IEEE, 2019

1903
[7]

B. Wen, C. Mitash, B. Ren, and K. E. Bekris. se (3)-tracknet: Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10367–10373. IEEE, 2020

2020
[8]

B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17868–17879, 2024

2024
[9]

Liang, Y

T. Liang, Y . Zeng, J. Xie, and B. Zhou. Dynamicpose: Real-time and robust 6d object pose tracking for fast-moving cameras and objects. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2424–2431. IEEE, 2025

2025
[10]

Mellinger, N

D. Mellinger, N. Michael, and V . Kumar. Trajectory generation and control for precise ag- gressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5): 664–674, 2012

2012
[11]

Thomas, M

J. Thomas, M. Pope, G. Loianno, E. W. Hawkes, M. A. Estrada, H. Jiang, M. R. Cutkosky, and V . Kumar. Aggressive flight with quadrotors for perching on inclined surfaces.Journal of Mechanisms and Robotics, 8(5):051007, 2016

2016
[12]

J. Mao, G. Li, S. Nogar, C. Kroninger, and G. Loianno. Aggressive visual perching with quadrotors on inclined surfaces. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5242–5248. IEEE, 2021

2021
[13]

J. Ji, T. Yang, C. Xu, and F. Gao. Real-time trajectory planning for aerial perching. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10516– 10522. IEEE, 2022

2022
[14]

Zhang, Z

Y . Zhang, Z. Wu, and T. Wei. Precise landing on moving platform for quadrotor uav via extended disturbance observer.IEEE Transactions on Intelligent Vehicles, 2024

2024
[15]

S. Liu, W. Hu, Z. Wang, W. Dong, and X. Sheng. Quadrotors’ perching on moving inclined surfaces using uncertainty tolerant planner and thrust regulation.Robotics and Autonomous Systems, 191:105011, 2025. 10

2025
[16]

J. Mao, S. Nogar, C. M. Kroninger, and G. Loianno. Robust active visual perching with quadrotors on inclined surfaces.IEEE Transactions on Robotics, 39(3):1836–1852, 2023

2023
[17]

Y . Gao, J. Ji, Q. Wang, R. Jin, Y . Lin, Z. Shang, Y . Cao, S. Shen, C. Xu, and F. Gao. Adaptive tracking and perching for quadrotor in dynamic scenarios.IEEE Transactions on Robotics, 40: 499–519, 2023

2023
[18]

Polvara, M

R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, and A. Cangelosi. Toward end-to-end control for uav autonomous landing via deep reinforcement learning. In 2018 International conference on unmanned aircraft systems (ICUAS), pages 115–123. IEEE, 2018

2018
[19]

J. E. Kooi and R. Babu ˇska. Inclined quadrotor landing using deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2361–2368. IEEE, 2021

2021
[20]

van der Heijden, J

B. van der Heijden, J. Luijkx, L. Ferranti, J. Kober, and R. Babuska. Eagerx: Graph-based framework for sim2real robot learning.arXiv preprint arXiv:2407.04328, 2024

work page arXiv 2024
[21]

Rodriguez-Ramos, C

A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. De La Puente, and P. Campoy. A deep reinforcement learning strategy for uav autonomous landing on a moving platform.Journal of Intelligent & Robotic Systems, 93(1):351–366, 2019

2019
[22]

Goldschmid and A

P. Goldschmid and A. Ahmad. Reinforcement learning based autonomous multi-rotor landing on moving platforms.Autonomous Robots, 48(4):13, 2024

2024
[23]

C. Wang, J. Wang, C. Wei, Y . Zhu, D. Yin, and J. Li. Vision-based deep reinforcement learning of uav-ugv collaborative landing policy using automatic curriculum.Drones, 7(11):676, 2023

2023
[24]

Ladosz, M

P. Ladosz, M. Mammadov, H. Shin, W. Shin, and H. Oh. Autonomous landing on a mov- ing platform using vision-based deep reinforcement learning.IEEE Robotics and Automation Letters, 9(5):4575–4582, 2024

2024
[25]

W. Shin, M. Kim, T. Park, G. Bae, S. Kim, and H. Oh. Vision-based autonomous drone landing on moving platforms with uncertain motion via deep reinforcement learning.IEEE Robotics and Automation Letters, 2026

2026
[26]

Kaufmann, L

E. Kaufmann, L. Bauersfeld, and D. Scaramuzza. A benchmark comparison of learned control policies for agile quadrotor flight. In2022 International Conference on Robotics and Automa- tion (ICRA), pages 10504–10510. IEEE, 2022

2022
[27]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observ- able stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

1998
[28]

J. Xing, A. Romero, L. Bauersfeld, and D. Scaramuzza. Bootstrapping reinforcement learning with imitation for vision-based agile flight.arXiv preprint arXiv:2403.12203, 2024

work page arXiv 2024
[29]

T. Wu, Y . Chen, T. Chen, G. Zhao, and F. Gao. Whole-body control through narrow gaps from pixels to action. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11317–11324. IEEE, 2025

2025
[30]

Huang, J

Y . Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, and H. Chen. A survey on trajectory-prediction methods for autonomous driving.IEEE transactions on intelligent vehicles, 7(3):652–674, 2022

2022
[31]

G. Chang. Robust kalman filtering based on mahalanobis distance as outlier judging criterion. Journal of Geodesy, 88(4):391–401, 2014

2014
[32]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017. 11

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

B. Xu, F. Gao, C. Yu, R. Zhang, Y . Wu, and Y . Wang. Omnidrones: An efficient and flexible platform for reinforcement learning in drone control.IEEE Robotics and Automation Letters, 9(3):2838–2844, 2024

2024
[34]

J. Chen, C. Yu, Y . Xie, F. Gao, Y . Chen, S. Yu, W. Tang, S. Ji, M. Mu, Y . Wu, et al. What matters in learning a zero-shot sim-to-real rl policy for quadrotor control? a comprehensive study.IEEE Robotics and Automation Letters, 2025

2025
[35]

J. A. Preiss, W. Honig, G. S. Sukhatme, and N. Ayanian. Crazyswarm: A large nano- quadcopter swarm. In2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3299–3304. IEEE, 2017. 12 A Supplementary Materials A.1 System Dynamics for Policy Training Quadrotor Dynamics:The quadrotor is modeled with standard dynamics as follows: ˙p=v, m ...

2017

[1] [1]

D. N. Das, R. Sewani, J. Wang, and M. K. Tiwari. Synchronized truck and drone routing in package delivery logistics.IEEE Transactions on Intelligent Transportation Systems, 22(9): 5772–5782, 2020

2020

[2] [2]

G. Wu, N. Mao, Q. Luo, B. Xu, J. Shi, and P. N. Suganthan. Collaborative truck-drone routing for contactless parcel delivery during the epidemic.IEEE Transactions on Intelligent Trans- portation Systems, 23(12):25077–25091, 2022

2022

[3] [3]

Y . Liu, Z. Liu, J. Shi, G. Wu, and W. Pedrycz. Two-echelon routing problem for parcel de- livery by cooperated truck and drone.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(12):7450–7465, 2020

2020

[4] [4]

G. Wu, W. Pedrycz, H. Li, M. Ma, and J. Liu. Coordinated planning of heterogeneous earth observation resources.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(1): 109–125, 2015

2015

[5] [5]

Tokekar, J

P. Tokekar, J. Vander Hook, D. Mulla, and V . Isler. Sensor planning for a symbiotic uav and ugv system for precision agriculture.IEEE transactions on robotics, 32(6):1498–1511, 2016

2016

[6] [6]

Krogius, A

M. Krogius, A. Haggenmiller, and E. Olson. Flexible layouts for fiducial tags. In2019 ieee/rsj international conference on intelligent robots and systems (iros), pages 1898–1903. IEEE, 2019

1903

[7] [7]

B. Wen, C. Mitash, B. Ren, and K. E. Bekris. se (3)-tracknet: Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10367–10373. IEEE, 2020

2020

[8] [8]

B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17868–17879, 2024

2024

[9] [9]

Liang, Y

T. Liang, Y . Zeng, J. Xie, and B. Zhou. Dynamicpose: Real-time and robust 6d object pose tracking for fast-moving cameras and objects. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2424–2431. IEEE, 2025

2025

[10] [10]

Mellinger, N

D. Mellinger, N. Michael, and V . Kumar. Trajectory generation and control for precise ag- gressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5): 664–674, 2012

2012

[11] [11]

Thomas, M

J. Thomas, M. Pope, G. Loianno, E. W. Hawkes, M. A. Estrada, H. Jiang, M. R. Cutkosky, and V . Kumar. Aggressive flight with quadrotors for perching on inclined surfaces.Journal of Mechanisms and Robotics, 8(5):051007, 2016

2016

[12] [12]

J. Mao, G. Li, S. Nogar, C. Kroninger, and G. Loianno. Aggressive visual perching with quadrotors on inclined surfaces. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5242–5248. IEEE, 2021

2021

[13] [13]

J. Ji, T. Yang, C. Xu, and F. Gao. Real-time trajectory planning for aerial perching. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10516– 10522. IEEE, 2022

2022

[14] [14]

Zhang, Z

Y . Zhang, Z. Wu, and T. Wei. Precise landing on moving platform for quadrotor uav via extended disturbance observer.IEEE Transactions on Intelligent Vehicles, 2024

2024

[15] [15]

S. Liu, W. Hu, Z. Wang, W. Dong, and X. Sheng. Quadrotors’ perching on moving inclined surfaces using uncertainty tolerant planner and thrust regulation.Robotics and Autonomous Systems, 191:105011, 2025. 10

2025

[16] [16]

J. Mao, S. Nogar, C. M. Kroninger, and G. Loianno. Robust active visual perching with quadrotors on inclined surfaces.IEEE Transactions on Robotics, 39(3):1836–1852, 2023

2023

[17] [17]

Y . Gao, J. Ji, Q. Wang, R. Jin, Y . Lin, Z. Shang, Y . Cao, S. Shen, C. Xu, and F. Gao. Adaptive tracking and perching for quadrotor in dynamic scenarios.IEEE Transactions on Robotics, 40: 499–519, 2023

2023

[18] [18]

Polvara, M

R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, and A. Cangelosi. Toward end-to-end control for uav autonomous landing via deep reinforcement learning. In 2018 International conference on unmanned aircraft systems (ICUAS), pages 115–123. IEEE, 2018

2018

[19] [19]

J. E. Kooi and R. Babu ˇska. Inclined quadrotor landing using deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2361–2368. IEEE, 2021

2021

[20] [20]

van der Heijden, J

B. van der Heijden, J. Luijkx, L. Ferranti, J. Kober, and R. Babuska. Eagerx: Graph-based framework for sim2real robot learning.arXiv preprint arXiv:2407.04328, 2024

work page arXiv 2024

[21] [21]

Rodriguez-Ramos, C

A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. De La Puente, and P. Campoy. A deep reinforcement learning strategy for uav autonomous landing on a moving platform.Journal of Intelligent & Robotic Systems, 93(1):351–366, 2019

2019

[22] [22]

Goldschmid and A

P. Goldschmid and A. Ahmad. Reinforcement learning based autonomous multi-rotor landing on moving platforms.Autonomous Robots, 48(4):13, 2024

2024

[23] [23]

C. Wang, J. Wang, C. Wei, Y . Zhu, D. Yin, and J. Li. Vision-based deep reinforcement learning of uav-ugv collaborative landing policy using automatic curriculum.Drones, 7(11):676, 2023

2023

[24] [24]

Ladosz, M

P. Ladosz, M. Mammadov, H. Shin, W. Shin, and H. Oh. Autonomous landing on a mov- ing platform using vision-based deep reinforcement learning.IEEE Robotics and Automation Letters, 9(5):4575–4582, 2024

2024

[25] [25]

W. Shin, M. Kim, T. Park, G. Bae, S. Kim, and H. Oh. Vision-based autonomous drone landing on moving platforms with uncertain motion via deep reinforcement learning.IEEE Robotics and Automation Letters, 2026

2026

[26] [26]

Kaufmann, L

E. Kaufmann, L. Bauersfeld, and D. Scaramuzza. A benchmark comparison of learned control policies for agile quadrotor flight. In2022 International Conference on Robotics and Automa- tion (ICRA), pages 10504–10510. IEEE, 2022

2022

[27] [27]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observ- able stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

1998

[28] [28]

J. Xing, A. Romero, L. Bauersfeld, and D. Scaramuzza. Bootstrapping reinforcement learning with imitation for vision-based agile flight.arXiv preprint arXiv:2403.12203, 2024

work page arXiv 2024

[29] [29]

T. Wu, Y . Chen, T. Chen, G. Zhao, and F. Gao. Whole-body control through narrow gaps from pixels to action. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11317–11324. IEEE, 2025

2025

[30] [30]

Huang, J

Y . Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, and H. Chen. A survey on trajectory-prediction methods for autonomous driving.IEEE transactions on intelligent vehicles, 7(3):652–674, 2022

2022

[31] [31]

G. Chang. Robust kalman filtering based on mahalanobis distance as outlier judging criterion. Journal of Geodesy, 88(4):391–401, 2014

2014

[32] [32]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017. 11

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

B. Xu, F. Gao, C. Yu, R. Zhang, Y . Wu, and Y . Wang. Omnidrones: An efficient and flexible platform for reinforcement learning in drone control.IEEE Robotics and Automation Letters, 9(3):2838–2844, 2024

2024

[34] [34]

J. Chen, C. Yu, Y . Xie, F. Gao, Y . Chen, S. Yu, W. Tang, S. Ji, M. Mu, Y . Wu, et al. What matters in learning a zero-shot sim-to-real rl policy for quadrotor control? a comprehensive study.IEEE Robotics and Automation Letters, 2025

2025

[35] [35]

J. A. Preiss, W. Honig, G. S. Sukhatme, and N. Ayanian. Crazyswarm: A large nano- quadcopter swarm. In2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3299–3304. IEEE, 2017. 12 A Supplementary Materials A.1 System Dynamics for Policy Training Quadrotor Dynamics:The quadrotor is modeled with standard dynamics as follows: ˙p=v, m ...

2017