pith. sign in

arxiv: 2606.03441 · v2 · pith:TRXUWKRHnew · submitted 2026-06-02 · 💻 cs.RO · cs.LG

PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion

Pith reviewed 2026-06-28 09:24 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords reinforcement learningvision-based perchingquadrotor controlagile flightmoving inclined platformsroboticsautonomous landing
0
0 comments X

The pith

A two-stage reinforcement learning method enables quadrotors to perch on inclined platforms moving rapidly and irregularly using only vision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Quadrotors struggle to perch on moving inclined surfaces because their cameras have a narrow field of view that frequently loses sight of the target. The paper presents PerchRL, which first pre-trains a policy with complete state data and then fine-tunes it on raw images, adding randomized platform paths during training and temporal augmentation of past observations to capture motion patterns. A visibility-aware augmentation step and rewards that encourage active looking help the policy recover when the platform disappears from view. Simulation and hardware tests show the resulting policies perch successfully in real time and transfer to different quadrotor bodies without retuning.

Core claim

PerchRL shows that a reinforcement learning policy trained in two stages—state-based pre-training followed by vision-based fine-tuning—can achieve stable agile perching on inclined platforms under rapid irregular motion when the training distribution includes randomized trajectories, temporal history augmentation, visibility-aware image augmentation, and active-perception rewards.

What carries the argument

Two-stage RL pipeline that pre-trains on full state then fine-tunes on vision, with randomized platform trajectories, temporal augmentation of observations, visibility-aware state augmentation, and active perception rewards.

If this is right

  • The learned policies run in real time on physical quadrotors.
  • The same policy transfers across distinct quadrotor platforms without retuning.
  • Successful perching occurs under both simulated and real rapid irregular platform motion.
  • The hybrid visibility and active-perception components maintain performance during intermittent visual loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training recipe could be applied to perching on other surface orientations if the randomization range is expanded accordingly.
  • Temporal augmentation may allow the policy to anticipate short-term platform motion even without an explicit predictor.
  • Cross-platform success suggests the policy has captured platform-agnostic dynamics rather than hardware-specific parameters.

Load-bearing premise

Randomized platform trajectories during training plus temporal augmentation will produce policies that generalize to the distribution of real-world irregular motions without requiring platform-specific tuning.

What would settle it

Real-world trials in which the platform follows an irregular trajectory outside the randomized training distribution produce repeated perching failures or loss of stability.

Figures

Figures reproduced from arXiv: 2606.03441 by Boyu Zhou, Huaxu Li, Jie Mei, Jinqiang Cui, U Kei Cheang, Youmin Gong, Zihong Lu, Zongzhuo Liu.

Figure 1
Figure 1. Figure 1: Real-world demonstration of PerchRL. The ground vehicle equipped with a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System overview of PerchRL, including the training pipeline and deployment architecture. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Baseline and ablation studies of state-based perching under diverse platform motion pat [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The training curves of success rate and average [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the Normal-I experiment. (a) Snap￾shot of the whole process. The quadrotor perches on the platform, which follows a spline trajectory at approximately 2.0 m/s with a 70◦ inclination, and the inset shows the key hardware components of our custom-built quadrotor. (b) 3D trajectories of the quadrotor and platform. (c)(d) Velocity and pitch angle curves of the quadrotor and platform. Normal-I E… view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison between the state-based and vision-based policies. The heatmaps [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Real-world demonstration of PerchRL for state-based perching on a [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hardware platforms for the vision-based perching experiments. (a) An illustration of the [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Snapshots across all experimental scenarios. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
read the original abstract

Autonomous vision-based perching of quadrotors on moving inclined platforms is critical for air-ground collaboration but remains challenging due to the limited field of view (FOV). In this paper, we propose PerchRL, a reinforcement learning (RL) framework for vision-based agile perching on inclined platforms under rapid and irregular motion. Specifically, we employ a two-stage learning strategy consisting of state-based pre-training followed by vision-based fine-tuning. To improve generalization across diverse platform motions, we employ randomized platform trajectories to prevent overfitting and temporal augmentation methods to capture latent motion patterns from historical observations. During vision-based fine-tuning, a hybrid learning framework consisting of visibility-aware state augmentation and active perception rewards is presented to improve robustness under intermittent visual loss. Extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance of PerchRL, while successful deployment across distinct quadrotor platforms further validates its adaptability. The source code will be released to benefit the community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes PerchRL, a two-stage RL framework for vision-based agile perching of quadrotors on inclined platforms undergoing rapid and irregular motion. It consists of state-based pre-training on randomized trajectories with temporal augmentation to improve generalization, followed by vision-based fine-tuning using a hybrid framework with visibility-aware state augmentation and active perception rewards to handle intermittent visual loss. The central claim is that extensive simulation and real-world experiments demonstrate feasibility, stability, real-time performance, and adaptability across distinct quadrotor platforms without platform-specific tuning.

Significance. If the experimental results hold with proper quantitative support, the work would contribute to agile aerial robotics by offering a generalizable approach to vision-based perching under challenging motion conditions, with potential applications in air-ground collaboration. The planned release of source code would support reproducibility in the field.

major comments (3)
  1. [Abstract, §4] Abstract and §4 (Experiments): The central claim that 'extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance' is unsupported, as the manuscript provides no quantitative metrics, success rates, baseline comparisons, failure rates, or error bars to substantiate performance or generalization.
  2. [§3.2, §3.3] §3.2 (two-stage pipeline) and §3.3 (hybrid vision fine-tuning): The adaptability claim without platform-specific tuning rests on randomized trajectories plus temporal augmentation producing policies robust to real irregular motions, but no distributional validation (e.g., power spectra, acceleration histograms, or KL divergence between training and test motions) or ablations isolating their effect on cross-platform success is reported.
  3. [§4] §4 (real-world deployment): The assertion of successful deployment across distinct quadrotor platforms validating adaptability lacks any reported quantitative cross-platform metrics or controls for test diversity, leaving open whether results reflect the claimed generalization or limited test conditions.
minor comments (2)
  1. [§3] Notation for the temporal augmentation and active perception reward terms is introduced without explicit equations or parameter definitions, reducing clarity for readers attempting to reproduce the method.
  2. [Abstract] The abstract states source code will be released, but no link or repository is provided in the manuscript.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened with additional quantitative support. We address each major comment below and will revise the manuscript to incorporate the suggested analyses and metrics.

read point-by-point responses
  1. Referee: [Abstract, §4] Abstract and §4 (Experiments): The central claim that 'extensive simulation and real-world experiments demonstrate the feasibility, stability, and real-time performance' is unsupported, as the manuscript provides no quantitative metrics, success rates, baseline comparisons, failure rates, or error bars to substantiate performance or generalization.

    Authors: We agree that the abstract and experiments section would benefit from explicit quantitative metrics. In the revised manuscript, we will add tables reporting success rates, failure rates, baseline comparisons, and error bars from repeated trials to substantiate the claims of feasibility, stability, and real-time performance. revision: yes

  2. Referee: [§3.2, §3.3] §3.2 (two-stage pipeline) and §3.3 (hybrid vision fine-tuning): The adaptability claim without platform-specific tuning rests on randomized trajectories plus temporal augmentation producing policies robust to real irregular motions, but no distributional validation (e.g., power spectra, acceleration histograms, or KL divergence between training and test motions) or ablations isolating their effect on cross-platform success is reported.

    Authors: The referee is correct that distributional validation and targeted ablations are not currently reported. We will include comparisons of motion distributions (power spectra, acceleration histograms, KL divergence) between training and test sets, as well as ablation studies isolating the impact of randomized trajectories and temporal augmentation on cross-platform generalization. revision: yes

  3. Referee: [§4] §4 (real-world deployment): The assertion of successful deployment across distinct quadrotor platforms validating adaptability lacks any reported quantitative cross-platform metrics or controls for test diversity, leaving open whether results reflect the claimed generalization or limited test conditions.

    Authors: We acknowledge the need for quantitative cross-platform metrics. The revision will report specific success rates, performance metrics, and controls for test diversity across the distinct quadrotor platforms to better support the adaptability claim. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed results or derivation chain

full rationale

The paper describes an empirical RL pipeline (two-stage training with randomized trajectories and temporal augmentation, followed by hybrid vision fine-tuning) and supports its claims exclusively via simulation and real-world experiments on multiple platforms. No equations, fitted parameters, or self-citations are presented that would make reported success rates or generalization reduce to the training choices by construction. The central claims rest on external experimental outcomes rather than any self-definitional, fitted-input, or uniqueness-imported mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard RL assumptions about reward design and randomization that are not detailed.

pith-pipeline@v0.9.1-grok · 5729 in / 945 out tokens · 19737 ms · 2026-06-28T09:24:37.657902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    D. N. Das, R. Sewani, J. Wang, and M. K. Tiwari. Synchronized truck and drone routing in package delivery logistics.IEEE Transactions on Intelligent Transportation Systems, 22(9): 5772–5782, 2020

  2. [2]

    G. Wu, N. Mao, Q. Luo, B. Xu, J. Shi, and P. N. Suganthan. Collaborative truck-drone routing for contactless parcel delivery during the epidemic.IEEE Transactions on Intelligent Trans- portation Systems, 23(12):25077–25091, 2022

  3. [3]

    Y . Liu, Z. Liu, J. Shi, G. Wu, and W. Pedrycz. Two-echelon routing problem for parcel de- livery by cooperated truck and drone.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(12):7450–7465, 2020

  4. [4]

    G. Wu, W. Pedrycz, H. Li, M. Ma, and J. Liu. Coordinated planning of heterogeneous earth observation resources.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(1): 109–125, 2015

  5. [5]

    Tokekar, J

    P. Tokekar, J. Vander Hook, D. Mulla, and V . Isler. Sensor planning for a symbiotic uav and ugv system for precision agriculture.IEEE transactions on robotics, 32(6):1498–1511, 2016

  6. [6]

    Krogius, A

    M. Krogius, A. Haggenmiller, and E. Olson. Flexible layouts for fiducial tags. In2019 ieee/rsj international conference on intelligent robots and systems (iros), pages 1898–1903. IEEE, 2019

  7. [7]

    B. Wen, C. Mitash, B. Ren, and K. E. Bekris. se (3)-tracknet: Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10367–10373. IEEE, 2020

  8. [8]

    B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17868–17879, 2024

  9. [9]

    Liang, Y

    T. Liang, Y . Zeng, J. Xie, and B. Zhou. Dynamicpose: Real-time and robust 6d object pose tracking for fast-moving cameras and objects. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2424–2431. IEEE, 2025

  10. [10]

    Mellinger, N

    D. Mellinger, N. Michael, and V . Kumar. Trajectory generation and control for precise ag- gressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5): 664–674, 2012

  11. [11]

    Thomas, M

    J. Thomas, M. Pope, G. Loianno, E. W. Hawkes, M. A. Estrada, H. Jiang, M. R. Cutkosky, and V . Kumar. Aggressive flight with quadrotors for perching on inclined surfaces.Journal of Mechanisms and Robotics, 8(5):051007, 2016

  12. [12]

    J. Mao, G. Li, S. Nogar, C. Kroninger, and G. Loianno. Aggressive visual perching with quadrotors on inclined surfaces. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5242–5248. IEEE, 2021

  13. [13]

    J. Ji, T. Yang, C. Xu, and F. Gao. Real-time trajectory planning for aerial perching. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10516– 10522. IEEE, 2022

  14. [14]

    Zhang, Z

    Y . Zhang, Z. Wu, and T. Wei. Precise landing on moving platform for quadrotor uav via extended disturbance observer.IEEE Transactions on Intelligent Vehicles, 2024

  15. [15]

    S. Liu, W. Hu, Z. Wang, W. Dong, and X. Sheng. Quadrotors’ perching on moving inclined surfaces using uncertainty tolerant planner and thrust regulation.Robotics and Autonomous Systems, 191:105011, 2025. 10

  16. [16]

    J. Mao, S. Nogar, C. M. Kroninger, and G. Loianno. Robust active visual perching with quadrotors on inclined surfaces.IEEE Transactions on Robotics, 39(3):1836–1852, 2023

  17. [17]

    Y . Gao, J. Ji, Q. Wang, R. Jin, Y . Lin, Z. Shang, Y . Cao, S. Shen, C. Xu, and F. Gao. Adaptive tracking and perching for quadrotor in dynamic scenarios.IEEE Transactions on Robotics, 40: 499–519, 2023

  18. [18]

    Polvara, M

    R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton, and A. Cangelosi. Toward end-to-end control for uav autonomous landing via deep reinforcement learning. In 2018 International conference on unmanned aircraft systems (ICUAS), pages 115–123. IEEE, 2018

  19. [19]

    J. E. Kooi and R. Babu ˇska. Inclined quadrotor landing using deep reinforcement learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2361–2368. IEEE, 2021

  20. [20]

    van der Heijden, J

    B. van der Heijden, J. Luijkx, L. Ferranti, J. Kober, and R. Babuska. Eagerx: Graph-based framework for sim2real robot learning.arXiv preprint arXiv:2407.04328, 2024

  21. [21]

    Rodriguez-Ramos, C

    A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. De La Puente, and P. Campoy. A deep reinforcement learning strategy for uav autonomous landing on a moving platform.Journal of Intelligent & Robotic Systems, 93(1):351–366, 2019

  22. [22]

    Goldschmid and A

    P. Goldschmid and A. Ahmad. Reinforcement learning based autonomous multi-rotor landing on moving platforms.Autonomous Robots, 48(4):13, 2024

  23. [23]

    C. Wang, J. Wang, C. Wei, Y . Zhu, D. Yin, and J. Li. Vision-based deep reinforcement learning of uav-ugv collaborative landing policy using automatic curriculum.Drones, 7(11):676, 2023

  24. [24]

    Ladosz, M

    P. Ladosz, M. Mammadov, H. Shin, W. Shin, and H. Oh. Autonomous landing on a mov- ing platform using vision-based deep reinforcement learning.IEEE Robotics and Automation Letters, 9(5):4575–4582, 2024

  25. [25]

    W. Shin, M. Kim, T. Park, G. Bae, S. Kim, and H. Oh. Vision-based autonomous drone landing on moving platforms with uncertain motion via deep reinforcement learning.IEEE Robotics and Automation Letters, 2026

  26. [26]

    Kaufmann, L

    E. Kaufmann, L. Bauersfeld, and D. Scaramuzza. A benchmark comparison of learned control policies for agile quadrotor flight. In2022 International Conference on Robotics and Automa- tion (ICRA), pages 10504–10510. IEEE, 2022

  27. [27]

    L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observ- able stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998

  28. [28]

    J. Xing, A. Romero, L. Bauersfeld, and D. Scaramuzza. Bootstrapping reinforcement learning with imitation for vision-based agile flight.arXiv preprint arXiv:2403.12203, 2024

  29. [29]

    T. Wu, Y . Chen, T. Chen, G. Zhao, and F. Gao. Whole-body control through narrow gaps from pixels to action. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11317–11324. IEEE, 2025

  30. [30]

    Huang, J

    Y . Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, and H. Chen. A survey on trajectory-prediction methods for autonomous driving.IEEE transactions on intelligent vehicles, 7(3):652–674, 2022

  31. [31]

    G. Chang. Robust kalman filtering based on mahalanobis distance as outlier judging criterion. Journal of Geodesy, 88(4):391–401, 2014

  32. [32]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017. 11

  33. [33]

    B. Xu, F. Gao, C. Yu, R. Zhang, Y . Wu, and Y . Wang. Omnidrones: An efficient and flexible platform for reinforcement learning in drone control.IEEE Robotics and Automation Letters, 9(3):2838–2844, 2024

  34. [34]

    J. Chen, C. Yu, Y . Xie, F. Gao, Y . Chen, S. Yu, W. Tang, S. Ji, M. Mu, Y . Wu, et al. What matters in learning a zero-shot sim-to-real rl policy for quadrotor control? a comprehensive study.IEEE Robotics and Automation Letters, 2025

  35. [35]

    J. A. Preiss, W. Honig, G. S. Sukhatme, and N. Ayanian. Crazyswarm: A large nano- quadcopter swarm. In2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3299–3304. IEEE, 2017. 12 A Supplementary Materials A.1 System Dynamics for Policy Training Quadrotor Dynamics:The quadrotor is modeled with standard dynamics as follows: ˙p=v, m ...