pith. sign in

arxiv: 2605.23717 · v1 · pith:ASE5ZVC5new · submitted 2026-05-22 · 💻 cs.RO

Vision-Based Agile Landing on Turbulent Waters

Pith reviewed 2026-05-25 04:02 UTC · model grok-4.3

classification 💻 cs.RO
keywords reinforcement learningautonomous landingmultirotormaritime platformsvisual featureszero-shot transferagile landingturbulent waters
0
0 comments X

The pith

A reinforcement learning policy enables multirotor landing on moving maritime platforms using only vehicle state and local visual features without explicit platform-state information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a reinforcement learning method for landing multirotors on ships in rough open-sea conditions. The policy takes the vehicle's own state measurements together with local visual keypoints and descriptors from the landing surface and outputs attitude and thrust commands that a standard low-level controller tracks. Training happens entirely in simulation with synthetic keypoints and random normalized descriptors so the same policy can be used directly with real feature extractors. This removes the need to maintain a separate model or estimate of the platform's motion. The method is shown to outperform a model-predictive-control baseline in simulation for very rough sea states and to succeed in real flights with two different onboard feature extractors.

Core claim

The central claim is that a reinforcement learning policy trained in simulation on multirotor state plus synthetic keypoints with randomly generated normalized descriptors can predict attitude and thrust commands that enable agile landing on moving maritime platforms, transfers zero-shot to real visual feature extractors, and outperforms a state-of-the-art Model Predictive Control baseline under platform motions corresponding to very rough sea conditions, all without requiring an explicit platform-state representation.

What carries the argument

Reinforcement learning policy that maps multirotor state measurements and local visual keypoints with associated descriptors directly to attitude and thrust commands tracked by a conventional low-level controller.

If this is right

  • The policy outperforms a state-of-the-art Model Predictive Control baseline in simulation under very rough sea conditions.
  • The same policy supports zero-shot deployment with different local feature extractors onboard the UAV.
  • Extensive real-world experiments confirm successful autonomous onboard landings using two different feature extractors.
  • The approach handles the coupled motion of vehicle and platform without maintaining an explicit platform-state estimate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training strategy could be tested on landing tasks involving other moving surfaces such as trucks or smaller boats.
  • Removing the platform-state requirement may lower the sensor and compute load for maritime drone operations.
  • Performance under additional disturbances such as wind gusts or changing lighting could be measured in follow-up trials.
  • The policy could be combined with existing low-level controllers already present on many multirotor platforms.

Load-bearing premise

Policies trained using only synthetic keypoints with randomly generated normalized descriptors transfer zero-shot to real visual feature extractors and real platform dynamics without any platform-state information.

What would settle it

A real-world landing trial in turbulent waters that succeeds when the onboard feature extractor is replaced by one whose descriptor distribution differs markedly from the synthetic training distribution would falsify the zero-shot transfer claim.

Figures

Figures reproduced from arXiv: 2605.23717 by Davide Scaramuzza, Dimosthenis Angelis, Evangelos Boukas, Leonard Bauersfeld.

Figure 1
Figure 1. Figure 1: A quadrotor lands autonomously on a floating platform in a pool subject to manually generated waves. The [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: We train a sensorimotor controller that maps [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed approach. Sparse local features are extracted from consecutive camera images [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histogram comparison between the proposed ap [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative successful landing during the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Autonomous landing of Unmanned Aerial Vehicles on maritime vessels is challenging due to the coupled motion of the vehicle and landing platform in open-sea conditions. This paper presents a reinforcement-learning-based approach for autonomous multirotor landing on moving maritime platforms without requiring explicit platform-state information. The proposed method uses multirotor state measurements together with local visual features, consisting of keypoints and associated descriptors extracted from the landing surface, to predict attitude and thrust commands. These commands are tracked by a conventional low-level controller. The policy is trained in simulation using synthetic keypoints with randomly generated normalized descriptors, enabling zero-shot deployment with different local feature extractors onboard the UAV. We evaluate the method in a realistic simulator and show that it outperforms a state-of-the-art Model Predictive Control baseline under platform motions corresponding to ``Very Rough'' sea conditions. Finally, we perform extensive real-world experiments, demonstrating autonomous onboard landing using two different local feature extractors. To the best of our knowledge, this is the first approach for agile multirotor landing on maritime platforms in turbulent waters that does not rely on an explicit platform-state representation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents a reinforcement-learning-based approach for autonomous multirotor landing on moving maritime platforms without requiring explicit platform-state information. The policy uses multirotor state measurements together with local visual features (keypoints and descriptors) extracted from the landing surface to predict attitude and thrust commands, which are tracked by a low-level controller. The policy is trained in simulation using synthetic keypoints with randomly generated normalized descriptors to enable zero-shot deployment with different real feature extractors. The authors claim that the method outperforms a state-of-the-art MPC baseline in a realistic simulator under 'Very Rough' sea conditions and demonstrate successful real-world experiments with two different local feature extractors, positioning it as the first such approach for agile landing in turbulent waters without explicit platform-state representation.

Significance. If the zero-shot transfer from synthetic random descriptors to real feature extractors holds and the outperformance is substantiated, this would be a meaningful contribution to vision-based maritime UAV control by removing reliance on difficult-to-estimate platform states in dynamic conditions. The empirical RL training pipeline and real-world testing are strengths that could advance practical deployment. However, the current presentation provides no quantitative metrics, error bars, training details, or failure cases, limiting the ability to gauge the result's robustness or impact.

major comments (2)
  1. [Abstract] Abstract: The claim that the method 'outperforms a state-of-the-art Model Predictive Control baseline under platform motions corresponding to ``Very Rough'' sea conditions' provides no quantitative metrics, tables, error bars, or specific results, which is load-bearing for assessing the central performance claim and the data-claim link.
  2. [Abstract] Abstract: The no-explicit-platform-state claim rests on the assumption that training exclusively on multirotor state plus synthetic keypoints with randomly generated normalized descriptors enables the policy to extract relative platform motion cues from real keypoint tracks and real descriptors (from any of two extractors) without platform pose or velocity; however, the randomization strategy is not shown to force learning from geometry/temporal evolution rather than the uniform random distribution, and real descriptors carry structured invariances plus unmodeled effects (camera noise, lighting, sea-spray) absent from the synthetic generator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the method 'outperforms a state-of-the-art Model Predictive Control baseline under platform motions corresponding to ``Very Rough'' sea conditions' provides no quantitative metrics, tables, error bars, or specific results, which is load-bearing for assessing the central performance claim and the data-claim link.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the performance claim. In the revised manuscript we will update the abstract to include key simulation metrics (success rates, mean landing position and attitude errors, and variability measures) comparing our policy to the MPC baseline under the 'Very Rough' sea conditions, with direct references to the corresponding figures and tables in the main text. revision: yes

  2. Referee: [Abstract] Abstract: The no-explicit-platform-state claim rests on the assumption that training exclusively on multirotor state plus synthetic keypoints with randomly generated normalized descriptors enables the policy to extract relative platform motion cues from real keypoint tracks and real descriptors (from any of two extractors) without platform pose or velocity; however, the randomization strategy is not shown to force learning from geometry/temporal evolution rather than the uniform random distribution, and real descriptors carry structured invariances plus unmodeled effects (camera noise, lighting, sea-spray) absent from the synthetic generator.

    Authors: The randomization of normalized descriptors is intended to remove any dependence on specific descriptor values, thereby encouraging the policy to extract relative motion from keypoint geometry and their temporal evolution. We acknowledge that the original submission did not contain an explicit ablation isolating the effect of descriptor randomization. We will add a targeted analysis and expanded methods discussion clarifying this design choice. The successful zero-shot real-world results obtained with two independent feature extractors already provide empirical evidence that the learned policy tolerates the structured differences and unmodeled effects present in real descriptors. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical RL pipeline with no derivations or fitted predictions

full rationale

The paper describes an RL policy trained in simulation on multirotor state plus synthetic keypoints with random descriptors, then deployed zero-shot on real feature extractors. No equations, uniqueness theorems, ansatzes, or parameter-fitting steps are present that could reduce a claimed prediction to its inputs by construction. The central claim (successful landing without explicit platform state) is an empirical outcome of training and testing, not a self-referential definition or renamed fit. Self-citations, if any, are not load-bearing for any derivation chain. This is a standard non-circular empirical robotics paper.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; no model architecture, loss functions, or training hyperparameters are described, so free parameters and axioms cannot be enumerated beyond the high-level assumptions stated in the abstract.

free parameters (1)
  • RL policy network weights
    Weights are learned during simulation training; exact count and regularization unknown from abstract.
axioms (2)
  • domain assumption Conventional low-level controller accurately tracks commanded attitude and thrust
    Explicitly stated in abstract as the interface between policy output and vehicle actuation.
  • ad hoc to paper Synthetic keypoints with random normalized descriptors sufficiently approximate real feature extractor behavior for zero-shot transfer
    Central to the claimed sim-to-real capability described in abstract.

pith-pipeline@v0.9.0 · 5729 in / 1326 out tokens · 28245 ms · 2026-05-25T04:02:19.563231+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    A UA V Search and Rescue Scenario with Human Body Detection and Geolocalization,

    P. Doherty and P. Rudol, “A UA V Search and Rescue Scenario with Human Body Detection and Geolocalization,” inAI 2007: Advances in Artificial Intelligence, M. A. Orgun and J. Thornton, Eds. Berlin, Heidelberg: Springer, 2007, pp. 1–13

  2. [2]

    Towards an Autonomous UA V-based System to Assist Search and Rescue Operations in Man Overboard Incidents,

    V . A. Feraru, R. E. Andersen, and E. Boukas, “Towards an Autonomous UA V-based System to Assist Search and Rescue Operations in Man Overboard Incidents,” in2020 IEEE Inter- national Symposium on Safety, Security, and Rescue Robotics (SSRR), November 2020, pp. 57–64, iSSN: 2475-8426

  3. [3]

    UA V Design for Fully Autonomous Man Overboard Detection,

    D. Angelis, R. E. Andersen, V . A. Feraru, and E. Boukas, “UA V Design for Fully Autonomous Man Overboard Detection,” in 2024 IEEE International Conference on Imaging Systems and Techniques (IST), October 2024, pp. 1–6

  4. [4]

    Us- ing low cost open source UA Vs for marine wild life monitoring - Field Report*,

    J. Fortuna, F. Ferreira, R. Gomes, S. Ferreira, and J. Sousa, “Us- ing low cost open source UA Vs for marine wild life monitoring - Field Report*,”IFAC Proceedings Volumes, vol. 46, no. 30, pp. 291–295, January 2013

  5. [5]

    Aerial Localization and Navigation for Surveillance of Large, Featureless, GNSS-Denied Maritime Environments,

    M. Peti, L. Markovi ´c, I. Lon ˇcar, A. Bari ˇsi´c Kula ˇs, F. Petric, A. Milas, J. Gori ˇcanec, M. Car, M. Orsag, B. A. Ferreira, and S. Bogdan, “Aerial Localization and Navigation for Surveillance of Large, Featureless, GNSS-Denied Maritime Environments,”Journal of Field Robotics, vol. 42, no. 8, pp. 4215–4235, 2025, eprint: https://onlinelibrary.wiley....

  6. [6]

    Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat,

    O. Proch ´azka, F. Nov ´ak, T. B ´aˇca, P. M. Gupta, R. P ˇeniˇcka, and M. Saska, “Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat,”Ocean Engineering, vol. 313, p. 119164, December 2024, arXiv:2412.07332 [cs]

  7. [7]

    Vision-Based Autonomous Landing for the UA V: A Review,

    L. Xin, Z. Tang, W. Gai, and H. Liu, “Vision-Based Autonomous Landing for the UA V: A Review,”Aerospace, vol. 9, no. 11, p. 634, November 2022

  8. [8]

    A Review on VTOL Autonomous Landing Strategies on Naval Dynamic Surfaces,

    L. G. Leandro de Paula, V . S. Dwivedi, H.-S. Shin, and A. Tsour- dos, “A Review on VTOL Autonomous Landing Strategies on Naval Dynamic Surfaces,” p. 30 pages, June 2022, artwork Size: 30 pages Medium: application/pdf Version Number: 1.0

  9. [9]

    Drone Land- ing and Reinforcement Learning: State-of-Art, Challenges and Opportunities,

    J. Amendola, L. R. Cenkeramaddi, and A. Jha, “Drone Land- ing and Reinforcement Learning: State-of-Art, Challenges and Opportunities,”IEEE Open Journal of Intelligent Transportation Systems, vol. 5, pp. 520–539, 2024

  10. [10]

    AprilTag: A robust and flexible visual fiducial sys- tem,

    E. Olson, “AprilTag: A robust and flexible visual fiducial sys- tem,” in2011 IEEE International Conference on Robotics and Automation, May 2011, pp. 3400–3407, iSSN: 1050-4729

  11. [11]

    Automatic generation and detection of highly reliable fiducial markers under occlusion,

    S. Garrido-Jurado, R. Mu ˜noz-Salinas, F. J. Madrid-Cuevas, and M. J. Mar ´ın-Jim´enez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,”Pattern Recog- nition, vol. 47, no. 6, pp. 2280–2292, June 2014

  12. [12]

    Autonomous ship deck landing of a quadrotor UA V using feed-forward image-based visual servoing,

    G. Cho, J. Choi, G. Bae, and H. Oh, “Autonomous ship deck landing of a quadrotor UA V using feed-forward image-based visual servoing,”Aerospace Science and Technology, vol. 130, p. 107869, November 2022

  13. [13]

    Robust Online Predictive Visual Servoing for Autonomous Landing of a Rotor UA V,

    L. Yang, X. Wang, Z. Liu, and L. Shen, “Robust Online Predictive Visual Servoing for Autonomous Landing of a Rotor UA V,”IEEE Transactions on Intelligent Vehicles, vol. 10, no. 4, pp. 2818– 2835, April 2025

  14. [14]

    Vision-based autonomous quadrotor landing on a moving platform,

    D. Falanga, A. Zanchettin, A. Simovic, J. Delmerico, and D. Scaramuzza, “Vision-based autonomous quadrotor landing on a moving platform,” in2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), October 2017, pp. 200–207, iSSN: 2475-8426

  15. [15]

    Landing a UA V in Harsh Winds and Turbulent Open Waters,

    P. M. Gupta, E. Pairet, T. Nascimento, and M. Saska, “Landing a UA V in Harsh Winds and Turbulent Open Waters,”IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 744–751, February 2023

  16. [16]

    A deep reinforcement learning strategy for uav autonomous landing on a moving platform,

    A. Rodriguez-Ramos, C. Sampedro, H. Bavle, P. de la Puente, and P. Campoy, “A deep reinforcement learning strategy for uav autonomous landing on a moving platform,”Journal of Intelligent & Robotic Systems, vol. 93, pp. 351–366, 2019

  17. [17]

    A deep reinforce- ment learning control strategy for vision-based ship landing of vertical flight aircraft,

    B. Lee, V . Saj, D. Kalathil, and M. Benedict, “A deep reinforce- ment learning control strategy for vision-based ship landing of vertical flight aircraft,” inAIAA AVIATION 2021 FORUM, 2021

  18. [18]

    Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform,

    L. Wu, C. Wang, P. Zhang, and C. Wei, “Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform,”Drones, vol. 6, no. 9, p. 238, 2022

  19. [19]

    A ship motion simulation system,

    S.-K. Ueng, D. Lin, and C.-H. Liu, “A ship motion simulation system,”Virtual Reality, vol. 12, no. 1, pp. 65–76, March 2008

  20. [20]

    Semantic-aware Active Perception for UA Vs using Deep Reinforcement Learning,

    L. Bartolomei, L. Teixeira, and M. Chli, “Semantic-aware Active Perception for UA Vs using Deep Reinforcement Learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, September 2021, pp. 3101–3108

  21. [21]

    Champion-level drone racing using deep reinforcement learning,

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982– 987, August 2023

  22. [22]

    Demonstrating Agile Flight from Pixels without State Estima- tion,

    I. Geles, L. Bauersfeld, A. Romero, J. Xing, and D. Scaramuzza, “Demonstrating Agile Flight from Pixels without State Estima- tion,” June 2024, arXiv:2406.12505 [cs]

  23. [23]

    SRPose: Two-View Relative Pose Estimation with Sparse Keypoints,

    R. Yin, Y . Zhang, Z. Pan, J. Zhu, C. Wang, and B. Jia, “SRPose: Two-View Relative Pose Estimation with Sparse Keypoints,” in Computer Vision - ECCV 2024, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds. Cham: Springer Nature Switzerland, 2025, pp. 88–107

  24. [24]

    Aerial gym simulator: A framework for highly parallelized simulation of aerial robots,

    M. Kulkarni, W. Rehberg, and K. Alexis, “Aerial gym simulator: A framework for highly parallelized simulation of aerial robots,” IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 4093– 4100, 2025

  25. [25]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” August 2017, arXiv:1707.06347 [cs]

  26. [26]

    Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces,

    P. Alcantarilla, J. Nuevo, and A. Bartoli, “Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces,” inProced- ings of the British Machine Vision Conference 2013. Bristol: British Machine Vision Association, 2013, pp. 13.1–13.11

  27. [27]

    Speeded-Up Robust Features (SURF),

    H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust Features (SURF),”Computer Vision and Image Under- standing, vol. 110, no. 3, pp. 346–359, June 2008

  28. [28]

    A Platform with Six Degrees of Freedom,

    D. Stewart, “A Platform with Six Degrees of Freedom,”Proceed- ings of the Institution of Mechanical Engineers, vol. 180, no. 1, pp. 371–386, June 1965