pith. sign in

arxiv: 2503.03282 · v1 · submitted 2025-03-05 · 💻 cs.RO

Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments

Pith reviewed 2026-05-23 01:31 UTC · model grok-4.3

classification 💻 cs.RO
keywords unmanned surface vehiclevisual dockingauto-labelingpose estimationneural networkautonomous navigationposition-based visual servo
0
0 comments X

The pith

A neural network predicts relative dock pose for USV autonomous docking using only auto-labeled real-world image pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a supervised learning pipeline for unmanned surface vehicles to dock autonomously by vision. An auto-labeling collection step pairs camera images with relative dock poses to build a training set without manual labeling. The Neural Dock Pose Estimator learns to output the dock's position and orientation directly from images. The resulting predictions support position-based visual servo control and low-level motion commands to complete the docking maneuver. Real-world water trials show the estimator remains accurate across changes in distance and vehicle speed.

Core claim

The NDPE accurately predicts the relative dock pose in real-world water environments, facilitating the implementation of Position-Based Visual Servo (PBVS) and low-level motion controllers for efficient and autonomous docking, all without hand-crafted feature engineering, camera calibration, or peripheral markers.

What carries the argument

The Neural Dock Pose Estimator (NDPE), a neural network trained on auto-labeled image-relative-pose pairs to regress dock pose.

If this is right

  • NDPE outputs enable direct use of PBVS and low-level controllers for docking.
  • The estimator maintains accuracy despite changes in distance and USV velocity.
  • No manual labeling, camera calibration, or markers are required for operation.
  • Real-world validation confirms the pipeline supports full autonomous docking tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could reduce dependence on GPS or other external positioning for port operations.
  • Auto-labeling pipelines of this form might transfer to vision-based docking for other vehicle classes.
  • Pairing the estimator with existing low-level controllers could support longer fully autonomous missions.
  • Performance under conditions outside the collected auto-label data remains untested and could be checked with targeted trials.

Load-bearing premise

The auto-labeling data collection pipeline produces sufficiently accurate ground-truth relative pose labels without manual intervention or external sensors that generalize to the test conditions.

What would settle it

A side-by-side comparison of NDPE predictions against independent external-sensor pose measurements collected during repeated real-water docking runs, where large systematic errors would disprove the accuracy claim.

Figures

Figures reproduced from arXiv: 2503.03282 by Eng Gee Lim, Paolo Paoletti, Xiaohui Zhu, Yijie Chu, Yong Yue, Ziniu Wu.

Figure 1
Figure 1. Figure 1: Illustration of a real-world autonomous docking task. The USV starts at the [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Block diagrams of visual-servo methods. geometry for pose estimation. Cai et al. [12] introduced an position-based path planning algorithm for USVs, utilizing long-range Ultra-Wideband (UWB) positioning to automate docking. This approach combined a median filter (MF) with an extended Kalman filter (EKF) to refine the USV’s positional accuracy in UWB-based coordinates. Their findings demonstrated that UWB-b… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the coordinate frames [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the framework. (a) The pipeline of data collection, augmentation [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Samples of logged data pairs. related to the dock. 2.2.2. Dataset Augmentation The diverse source dataset Dsrc is created by introducing variations in wa￾ter environments and different initial states. To further enhance the dataset, we apply a series of image operations that reflect to the complexity and dy￾namics of water environments. These operations include adding Gaussian noise and dropping pixels to … view at source ↗
Figure 6
Figure 6. Figure 6: Dataset augmentation in two experimental scenarios. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Descriptions of the hardware and components’ connection. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distributions of the dataset. 3.3. Experiments on NDPE The NDPE network was trained based on our gathered data. We per￾formed model training on a laptop (Intel i7-12700H processor and NVIDIA GeForce RTX 3070 Laptop GPU with CUDA). 80% of data points were used for model training, and 20% for model validation. The batch size was set at 32. A Stochastic Gradient Descent (SGD) optimizer with 0.001 learning rat… view at source ↗
Figure 9
Figure 9. Figure 9: The NDPE training loss. ing that the NDPE learns more adequately how to predict dock positions. The results from the validation set surface that the model has strong robust￾ness with a loss of around 0.04 on the unseen examples in training. The NDPE model was run on our onboard computer. It achieved a single-frame rollout frequency of 6 Hz. 3.3.1. Effect of Data Efficiency In [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 10
Figure 10. Figure 10: The NDPE loss and data efficiency. 3.3.2. Effect of USV-Dock Distance for a Single-frame Prediction We were interested in the relationship between the model loss and the distance of the USV from the dock. Initially, we hypothesized that the closer the USV is to the dock, the smaller the model inference loss [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distance-MSE relation for single-frame prediction. [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Velocity-MSE relation for single-frame prediction. [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: A sequence of first-person and third-person images of an trail. [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗
read the original abstract

Unmanned Surface Vehicles (USVs) are increasingly applied to water operations such as environmental monitoring and river-map modeling. It faces a significant challenge in achieving precise autonomous docking at ports or stations, still relying on remote human control or external positioning systems for accuracy and safety which limits the full potential of human-out-of-loop deployment for USVs.This paper introduces a novel supervised learning pipeline with the auto-labeling technique for USVs autonomous visual docking. Firstly, we designed an auto-labeling data collection pipeline that appends relative pose and image pair to the dataset. This step does not require conventional manual labeling for supervised learning. Secondly, the Neural Dock Pose Estimator (NDPE) is proposed to achieve relative dock pose prediction without the need for hand-crafted feature engineering, camera calibration, and peripheral markers. Moreover, The NDPE can accurately predict the relative dock pose in real-world water environments, facilitating the implementation of Position-Based Visual Servo (PBVS) and low-level motion controllers for efficient and autonomous docking.Experiments show that the NDPE is robust to the disturbance of the distance and the USV velocity. The effectiveness of our proposed solution is tested and validated in real-world water environments, reflecting its capability to handle real-world autonomous docking tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a supervised learning pipeline for autonomous visual docking of USVs. It features an auto-labeling data collection method that generates image-relative pose pairs without manual labeling, a Neural Dock Pose Estimator (NDPE) that predicts relative dock pose directly from images (avoiding hand-crafted features, calibration, and markers), and claims that the NDPE enables accurate PBVS-based docking. Experiments are said to demonstrate robustness to distance and velocity disturbances, with validation performed in real-world water environments.

Significance. If the auto-labeling pipeline yields accurate ground-truth labels and the NDPE generalizes with low error, the work could reduce reliance on external positioning systems for USV docking, supporting more fully autonomous operations in environmental monitoring and similar tasks. The empirical focus on real-water validation is a potential strength, but the absence of supporting metrics limits evaluation of practical impact.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'the NDPE can accurately predict the relative dock pose in real-world water environments' and that 'experiments show that the NDPE is robust to the disturbance of the distance and the USV velocity' is unsupported by any quantitative metrics, error distributions, baseline comparisons, or statistical analysis; without these, the empirical validation cannot be assessed.
  2. [Abstract and §3] Auto-labeling data collection pipeline (Abstract and §3): the description states that the pipeline 'appends relative pose and image pair to the dataset' without 'conventional manual labeling,' yet provides no mechanism, sensor integration, or verification procedure for obtaining the relative pose labels; this is load-bearing for the supervised training claim and the downstream PBVS assertion.
minor comments (1)
  1. [Abstract] Abstract: the sentence 'Moreover, The NDPE can accurately predict...' contains a capitalization inconsistency and should be rephrased to reflect that accuracy is a claim to be demonstrated rather than asserted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and agree that revisions are needed to strengthen the empirical support and technical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the NDPE can accurately predict the relative dock pose in real-world water environments' and that 'experiments show that the NDPE is robust to the disturbance of the distance and the USV velocity' is unsupported by any quantitative metrics, error distributions, baseline comparisons, or statistical analysis; without these, the empirical validation cannot be assessed.

    Authors: We agree the abstract claims require quantitative backing. The experimental section reports robustness tests, but we will revise the abstract to include specific metrics (e.g., mean pose errors, standard deviations, and disturbance ranges) along with brief baseline comparisons to support the accuracy and robustness assertions. revision: yes

  2. Referee: [Abstract and §3] Auto-labeling data collection pipeline (Abstract and §3): the description states that the pipeline 'appends relative pose and image pair to the dataset' without 'conventional manual labeling,' yet provides no mechanism, sensor integration, or verification procedure for obtaining the relative pose labels; this is load-bearing for the supervised training claim and the downstream PBVS assertion.

    Authors: We acknowledge the current description of the auto-labeling pipeline is insufficiently detailed. We will expand §3 to specify the mechanism (onboard RTK-GPS and IMU synchronization for relative pose computation), sensor integration steps, and verification (e.g., consistency checks against independent measurements), making the supervised training foundation explicit. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on real-world validation, not derivation

full rationale

The paper describes an empirical supervised-learning pipeline: an auto-labeling data-collection method produces image-pose pairs that train the NDPE network; performance is then measured on separate real-world test sequences. No equations, fitted parameters renamed as predictions, self-citation load-bearing uniqueness theorems, or ansatz smuggling appear. The auto-labeling step is a data-generation procedure whose accuracy is asserted to be sufficient for the downstream task; it is not shown to be definitionally equivalent to the network output. The central claim therefore remains an externally falsifiable statement about generalization on physical data rather than a reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the auto-generated labels are accurate enough to train a generalizable pose estimator and that real-world test conditions match the training distribution; no explicit free parameters, axioms, or invented entities are introduced beyond standard neural-network training.

axioms (2)
  • domain assumption Auto-generated relative-pose labels from the collection pipeline are sufficiently accurate and unbiased for supervised training.
    Invoked when the pipeline is said to append relative pose and image pairs without manual labeling.
  • standard math Standard supervised-learning assumptions hold (i.i.d. data, sufficient model capacity, appropriate loss).
    Implicit in any neural-network pose-estimation claim.

pith-pipeline@v0.9.0 · 5773 in / 1385 out tokens · 34204 ms · 2026-05-23T01:31:03.280287+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Unmanned surface vehicles: An overview of developments and challenges,

    Z. Liu, Y. Zhang, X. Yu, and C. Yuan, “Unmanned surface vehicles: An overview of developments and challenges,” Annual Reviews in Control , vol. 41, pp. 71–93, 2016

  2. [2]

    Management and sustainable exploitation of marine environments through smart moni- toring and automation,

    F. Glaviano, R. Esposito, A. D. Cosmo, F. Esposito, L. Gerevini, A. Ria, M. Molinara, P. Bruschi, M. Costantini, and V. Zupo, “Management and sustainable exploitation of marine environments through smart moni- toring and automation,” Journal of Marine Science and Engineering , vol. 10, no. 2, p. 297, 2022

  3. [3]

    A review of unmanned system technolo- gies with its application to aquaculture farm monitoring and manage- ment,

    N. A. Ubina and S.-C. Cheng, “A review of unmanned system technolo- gies with its application to aquaculture farm monitoring and manage- ment,” Drones, vol. 6, no. 1, p. 12, 2022

  4. [4]

    Development and simulation of an autonomous docking system for unmanned surface vehicles (usv),

    S. R. Aune, “Development and simulation of an autonomous docking system for unmanned surface vehicles (usv),” Master’s thesis, NTNU, 2019

  5. [5]

    Autonomous maritime navi- gation: Developing autonomy skill sets for usvs,

    E. Hansen, T. Huntsberger, and L. Elkins, “Autonomous maritime navi- gation: Developing autonomy skill sets for usvs,” in Unmanned Systems Technology VIII, vol. 6230. SPIE, 2006, pp. 272–291

  6. [6]

    Toward autonomous robotic containment booms: Visual servoing for robust inter-vehicle docking of surface vehicles,

    Y.-H. Kim, S.-W. Lee, H. S. Yang, and D. A. Shell, “Toward autonomous robotic containment booms: Visual servoing for robust inter-vehicle docking of surface vehicles,” Intelligent Service Robotics , vol. 5, pp. 1– 18, 2012

  7. [7]

    P. I. Corke, W. Jachimczyk, and R. Pillat, Robotics, vision and control: fundamental algorithms in MATLAB . Springer, 2011, vol. 73

  8. [8]

    Application of visual servo control in autonomous mobile rescue robots,

    H. Lang, M. T. Khan, K.-K. Tan, and C. W. de Silva, “Application of visual servo control in autonomous mobile rescue robots,” International Journal of Computers Communications & Control , vol. 11, no. 5, pp. 685–696, 2016

  9. [9]

    Comparing position- and image-based visual servoing for robotic assembly of large struc- tures,

    Y.-C. Peng, D. Jivani, R. J. Radke, and J. Wen, “Comparing position- and image-based visual servoing for robotic assembly of large struc- tures,” in 2020 IEEE 16th International Conference on Automation Sci- ence and Engineering (CASE) . IEEE, 2020, pp. 1608–1613. 27

  10. [10]

    Vision-based docking using an autonomous surface vehicle,

    M. Dunbabin, B. Lang, and B. Wood, “Vision-based docking using an autonomous surface vehicle,” in 2008 IEEE International Conference on Robotics and Automation . Pasadena, CA, USA: IEEE, May 2008, pp. 26–32. [Online]. Available: http://ieeexplore.ieee.org/document/ 4543182/

  11. [11]

    Vision-based positioning system for auto-docking of unmanned surface vehicles (USVs),

    Volden, A. Stahl, and T. I. Fossen, “Vision-based positioning system for auto-docking of unmanned surface vehicles (USVs),” International Journal of Intelligent Robotics and Applications , vol. 6, no. 1, pp. 86–103, Mar. 2022. [Online]. Available: https://link.springer.com/10. 1007/s41315-021-00193-0

  12. [12]

    Long-Range UWB Positioning-Based Automatic Docking Trajectory Design for Unmanned Surface Vehicle,

    W. Cai, M. Zhang, Q. Yang, C. Wang, and J. Shi, “Long-Range UWB Positioning-Based Automatic Docking Trajectory Design for Unmanned Surface Vehicle,” IEEE Transactions on Instrumentation and Measurement , vol. 72, pp. 1–12, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10109770/

  13. [13]

    Extreme Learning-Based Monocular Visual Servo of an Unmanned Surface Vessel,

    N. Wang and H. He, “Extreme Learning-Based Monocular Visual Servo of an Unmanned Surface Vessel,” IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5152–5163, Aug. 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9240068/

  14. [14]

    Robust jacobian estimation for uncalibrated visual servoing,

    A. Shademan, A.-M. Farahmand, and M. J¨ agersand, “Robust jacobian estimation for uncalibrated visual servoing,” in 2010 ieee international conference on robotics and automation . IEEE, 2010, pp. 5564–5569

  15. [15]

    Image-based visual servoing for usv under wave perturbations,

    Z. Jiang, J. Li, D. Ma, and H. Zheng, “Image-based visual servoing for usv under wave perturbations,” in 2024 36th Chinese Control and Decision Conference (CCDC). IEEE, 2024, pp. 317–322

  16. [16]

    Image-based visual servoing for docking of an autonomous underwater vehicle,

    M. F. Yahya and M. Arshad, “Image-based visual servoing for docking of an autonomous underwater vehicle,” in 2017 IEEE 7th International Conference on Underwater System Technology: Theory and Applications (USYS). IEEE, 2017, pp. 1–6

  17. [17]

    Predictive control for con- strained image-based visual servoing,

    G. Allibert, E. Courtial, and F. Chaumette, “Predictive control for con- strained image-based visual servoing,” IEEE Transactions on Robotics, vol. 26, no. 5, pp. 933–939, 2010. 28

  18. [18]

    Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances,

    P. Liu, H. Yu, and S. Cang, “Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances,” Nonlinear Dynamics, vol. 98, no. 2, pp. 1447–1464, 2019

  19. [19]

    Intelligent motion control of unmanned surface vehicles: A critical review,

    M. J. Er, C. Ma, T. Liu, and H. Gong, “Intelligent motion control of unmanned surface vehicles: A critical review,” Ocean Engineering, vol. 280, p. 114562, 2023

  20. [20]

    Pk- apf: Path-keeping algorithm for usvs based on artificial potential field,

    Y. Chu, Z. Wu, Y. Yue, X. Zhu, E. G. Lim, and P. Paoletti, “Pk- apf: Path-keeping algorithm for usvs based on artificial potential field,” Applied Sciences, vol. 12, no. 16, p. 8201, 2022

  21. [21]

    Head- ing control of unmanned marine vehicles based on an improved robust adaptive fuzzy neural network control algorithm,

    Z. Dong, T. Bao, M. Zheng, X. Yang, L. Song, and Y. Mao, “Head- ing control of unmanned marine vehicles based on an improved robust adaptive fuzzy neural network control algorithm,” IEEE Access, vol. 7, pp. 9704–9713, 2019

  22. [22]

    Finite-time plos-based integral sliding-mode adaptive neural path following for unmanned surface vessels with un- known dynamics and disturbances,

    Y. Yu, C. Guo, and H. Yu, “Finite-time plos-based integral sliding-mode adaptive neural path following for unmanned surface vessels with un- known dynamics and disturbances,” IEEE transactions on automation science and engineering, vol. 16, no. 4, pp. 1500–1511, 2019

  23. [23]

    Bounded neural net- work control for target tracking of underactuated autonomous surface vehicles in the presence of uncertain target dynamics,

    L. Liu, D. Wang, Z. Peng, C. P. Chen, and T. Li, “Bounded neural net- work control for target tracking of underactuated autonomous surface vehicles in the presence of uncertain target dynamics,” IEEE Trans- actions on Neural Networks and Learning Systems , vol. 30, no. 4, pp. 1241–1249, 2018

  24. [24]

    Robust adaptive self-structuring neural network bounded target tracking control of underactuated surface vessels,

    H. Liu, J. Lin, G. Yu, J. Yuan et al., “Robust adaptive self-structuring neural network bounded target tracking control of underactuated surface vessels,” Computational Intelligence and Neuroscience , vol. 2021, 2021

  25. [25]

    Neural network-based output feedback control for reference tracking of underactuated surface vessels,

    B. S. Park, J.-W. Kwon, and H. Kim, “Neural network-based output feedback control for reference tracking of underactuated surface vessels,” Automatica, vol. 77, pp. 353–359, 2017

  26. [26]

    Robust saturated dynamic surface controller design for underactuated fast surface vessels including actuator dynam- ics,

    O. Elhaki and K. Shojaei, “Robust saturated dynamic surface controller design for underactuated fast surface vessels including actuator dynam- ics,” Ocean Engineering, vol. 229, p. 108987, 2021. 29

  27. [27]

    An overview of de- velopments and challenges for unmanned surface vehicle autonomous berthing,

    G. Wu, D. Li, H. Ding, D. Shi, and B. Han, “An overview of de- velopments and challenges for unmanned surface vehicle autonomous berthing,” Complex & Intelligent Systems , vol. 10, no. 1, pp. 981–1003, 2024

  28. [28]

    Developing a navigation, guid- ance and obstacle avoidance algorithm for an unmanned surface vehicle (usv) by algorithms fusion,

    H. Mousazadeh, H. Jafarbiglu, H. Abdolmaleki, E. Omrani, F. Mon- haseri, M.-r. Abdollahzadeh, A. Mohammadi-Aghdam, A. Kiapei, Y. Salmani-Zakaria, and A. Makhsoos, “Developing a navigation, guid- ance and obstacle avoidance algorithm for an unmanned surface vehicle (usv) by algorithms fusion,” Ocean Engineering, vol. 159, pp. 56–65, 2018

  29. [29]

    Toward maritime robotic simu- lation in gazebo,

    B. Bingham, C. Aguero, M. McCarrin, J. Klamo, J. Malia, K. Allen, T. Lum, M. Rawson, and R. Waqar, “Toward maritime robotic simu- lation in gazebo,” in Proceedings of MTS/IEEE OCEANS Conference , Seattle, WA, October 2019

  30. [30]

    Automatic generation and detection of highly reliable fiducial markers under occlusion,

    S. Garrido-Jurado, R. Muoz-Salinas, F. Madrid-Cuevas, and M. Marn- Jimnez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0031320314000235

  31. [31]

    A. B. Jung, K. Wada, J. Crall, S. Tanaka, J. Graving, C. Reinders, S. Ya- dav, J. Banerjee, G. Vecsei, A. Kraft, Z. Rui, J. Borovec, C. Vallentin, S. Zhydenko, K. Pfeiffer, B. Cook, I. Fernndez, F.-M. De Rainville, C.- H. Weng, A. Ayala-Acevedo, R. Meudec, M. Laporte et al., “imgaug,” https://github.com/aleju/imgaug, 2020, online; accessed 01-Feb-2020

  32. [32]

    Tinker-Twins/SINGABOAT-VRX: Team SINGABOAT-VRX’s GitHub Repository for Virtual RobotX (VRX) Competition

    “Tinker-Twins/SINGABOAT-VRX: Team SINGABOAT-VRX’s GitHub Repository for Virtual RobotX (VRX) Competition.” [Online]. Available: https://github.com/Tinker-Twins/SINGABOAT-VRX/ tree/main 30