pith. sign in

arxiv: 2605.02529 · v1 · submitted 2026-05-04 · 💻 cs.RO

Sim-to-Real Transfer and Robustness Evaluation of Reinforcement Learning Control with Integrated Perception on an ASV for Floating Waste Capture

Pith reviewed 2026-05-08 18:30 UTC · model grok-4.3

classification 💻 cs.RO
keywords sim-to-real transferreinforcement learning controlautonomous surface vesselfloating waste capturepolarimetric perceptionrobustness evaluationperception abstractionactuation model fidelity
0
0 comments X

The pith

A reinforcement learning controller for an autonomous surface vessel transfers directly from simulation to real-world floating waste capture with centimeter accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a complete system that uses camera-based polarimetric perception to locate floating waste and feeds target points to a deep reinforcement learning controller that was trained only in simulation. To make the transfer work, the authors introduce a two-stage simulation protocol together with a perception abstraction module that mimics the behavior of the real camera. They test the same controller in matched simulation and field trials across fourteen different disturbance conditions and report centimeter-level terminal positioning accuracy together with generally robust behavior. The dominant remaining error source is mismatch in the model of how the boat responds to its actuators. The same pipeline is also shown working end-to-end in a real search-and-capture mission over hundreds of square meters of water.

Core claim

A DRL policy trained entirely inside a two-stage simulated environment that includes a perception abstraction module can be deployed without fine-tuning on a retrofitted ASV, converting real polarimetric camera detections into water-surface targets and achieving centimeter terminal accuracy while remaining robust across fourteen disturbance regimes; the principal performance limit is insufficient fidelity in the actuation model.

What carries the argument

Two-stage simulation protocol combined with a perception abstraction module that replicates real camera behavior, allowing the RL controller to be trained in simulation and transferred directly while quantifying the sim-to-real gap.

If this is right

  • The deployed controller reaches centimeter-level terminal accuracy in real capture tasks.
  • Control performance stays robust across all fourteen evaluated disturbance regimes.
  • A full search-and-capture mission using live camera detections succeeds over areas up to 450 square meters.
  • Improving the fidelity of the actuation model is the most direct way to shrink the remaining sim-to-real gap.
  • Careful handling of latency and timestamps across perception, abstraction, and control modules is required for reliable operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For other surface-vehicle tasks the dominant engineering effort should be spent on accurate actuator modeling rather than on more elaborate perception pipelines.
  • The same two-stage protocol offers a reusable template for quantifying transfer gaps in any water-surface robotic system.
  • Adding modest real-world domain randomization on top of the existing abstraction module could further reduce sensitivity to unmodeled disturbances.
  • The approach naturally extends to dynamic or moving targets, but new failure modes around target tracking latency would then need separate evaluation.

Load-bearing premise

The two-stage simulation protocol and perception abstraction module capture real-world camera images and boat hydrodynamics closely enough that a controller trained only in simulation can be used on the physical ASV without any additional fine-tuning.

What would settle it

A side-by-side field comparison of terminal capture error when the real ASV is driven by the original simulation-trained controller versus the same controller after its actuation parameters have been corrected to match measured real-boat response.

Figures

Figures reproduced from arXiv: 2605.02529 by C\'edric Pradalier, Luis F. W. Batista, St\'ephanie Aravecchia.

Figure 1
Figure 1. Figure 1: FIGURE 1: Autonomous surface vessels fitted with nets for view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2: Three environments used in this work: (a) platform for end-to-end evaluation in field trials; (b) testbed for view at source ↗
Figure 3
Figure 3. Figure 3: The system is powered by two 22 Ah, 4-cell LiPo batteries, providing over 5 hours of operation in our field trials. A DC–DC step-down converter supplies a regulated 12 V rail. Although the ASV carries a monocular camera and a 2D lidar, these were not used in the experiments reported here; instead, we use a Triton 5.0 MP (TRI050S1-QC) polarimetric camera built with a Sony IMX264MYR sensor, recording color a… view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3: The view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5 view at source ↗
Figure 6
Figure 6. Figure 6: FIGURE 6 view at source ↗
Figure 8
Figure 8. Figure 8: FIGURE 8: Calibration and range visualization. view at source ↗
Figure 9
Figure 9. Figure 9: FIGURE 9: Planar position error vs. distance-to-target. Er view at source ↗
Figure 10
Figure 10. Figure 10: FIGURE 10: Definition of reference coordinate frame and view at source ↗
Figure 11
Figure 11. Figure 11: FIGURE 11: Bar plots of the six metrics ( view at source ↗
Figure 12
Figure 12. Figure 12: FIGURE 12: Simulated and field-test paths for all experiments. Most cases show no visible degradation; view at source ↗
Figure 13
Figure 13. Figure 13: FIGURE 13: Trajectory and actuator commands showing view at source ↗
Figure 14
Figure 14. Figure 14: FIGURE 14 view at source ↗
Figure 15
Figure 15. Figure 15: FIGURE 15 view at source ↗
Figure 16
Figure 16. Figure 16: FIGURE 16 view at source ↗
Figure 17
Figure 17. Figure 17: FIGURE 17: Aggregate distribution of bottle detections in view at source ↗
Figure 18
Figure 18. Figure 18: FIGURE 18: Representative bottle-detection scenarios during autonomous collection. (a) Distant targets detection. (b) view at source ↗
read the original abstract

Autonomous surface vessels for floating-waste removal operate under varying hydrodynamics, external disturbances, and challenging water-surface perception. We present a field-validated system that combines camera-based polarimetric perception with a lightweight DRL-based controller for floating-waste detection and capture. Camera detections are converted into water-surface target points and tracked by a controller trained entirely in simulation and deployed directly on a retrofitted ASV platform. Our main contribution is a sim-to-real testing methodology that combines a two-stage simulation protocol with a perception abstraction module designed to mimic real camera behavior, enabling reproducible field trials and explicit evaluation of the sim-to-real gap. We apply this framework in matched simulation and field experiments across 14 disturbance regimes to expose failure modes and evaluate robustness. The results show centimeter-level terminal accuracy and indicate robust control performance under the evaluated perturbation regimes. The main source of degradation is insufficient actuation-model fidelity. We also demonstrate the system in a search-and-capture application using real camera detections in real-world conditions over areas of up to $450~m^2$. The study distills practical lessons for reliable transfer, including improved actuation-model fidelity, targeted domain randomization, and careful management of latency and timestamps across modules, while highlighting remaining challenges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims to have developed and field-validated a system for floating waste capture using an ASV equipped with camera-based polarimetric perception and a DRL controller trained entirely in simulation using a two-stage protocol and perception abstraction module. The controller is transferred directly to the real platform, achieving centimeter-level terminal accuracy and robust performance across 14 matched disturbance regimes in simulation and field experiments. The primary source of any degradation is attributed to insufficient actuation-model fidelity, and the system is demonstrated in a real-world search-and-capture task over areas up to 450 m². The work also provides practical lessons for sim-to-real transfer in such systems.

Significance. This research is significant for the field of robotics and control, as it addresses the challenging sim-to-real gap in a real-world application involving perception and control under hydrodynamic disturbances. The two-stage simulation protocol and perception abstraction module represent a thoughtful approach to making sim-to-real transfer more reproducible and evaluable. By explicitly identifying actuation fidelity as the main issue and providing lessons on domain randomization and latency management, the paper offers actionable insights that could benefit other researchers working on RL for marine vehicles. The field experiments add credibility to the claims.

minor comments (3)
  1. [Abstract and §5] The abstract and results sections should define 'centimeter-level terminal accuracy' with the precise metric (e.g., mean position error in meters) and report variability across trials.
  2. [§3.2] The description of the perception abstraction module would benefit from additional implementation details (e.g., how camera noise, distortion, or detection latency are modeled) to support reproducibility claims.
  3. [§5 and Figure 4] Figures showing the 14 disturbance regimes and corresponding accuracy results should include error bars and specify the number of trials per condition.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The referee summary correctly reflects our contributions regarding the two-stage simulation protocol, perception abstraction module, and field validation across 14 disturbance regimes. We appreciate the recognition of the work's significance for sim-to-real transfer in marine robotics.

Circularity Check

0 steps flagged

No significant circularity in experimental claims

full rationale

The paper is an experimental robotics study reporting sim-to-real transfer results for a DRL controller on an ASV. Central claims (centimeter-level terminal accuracy, robustness across 14 disturbance regimes) rest on direct matched sim/field comparisons and explicit identification of degradation sources (actuation-model fidelity), not on any mathematical derivation, prediction, or first-principles result that reduces to its own inputs by construction. The two-stage simulation protocol and perception abstraction module are presented as methodological design choices whose effectiveness is evaluated experimentally rather than assumed or fitted tautologically. No load-bearing self-citations, self-definitional steps, or renamed empirical patterns appear in the provided claims. This is a standard honest experimental paper whose validation chain is external to the reported numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the fidelity of the simulation model and the abstraction of perception; no new entities are introduced, but the assumption about model accuracy is key.

axioms (1)
  • domain assumption The simulation environment can be made to approximate real-world hydrodynamics and perception sufficiently for controller transfer.
    Invoked in the description of the two-stage simulation protocol and perception abstraction module.

pith-pipeline@v0.9.0 · 5534 in / 1293 out tokens · 57605 ms · 2026-05-08T18:30:51.874135+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Plastic debris in rivers,

    T. van Emmerik and A. Schwarz, “Plastic debris in rivers,”Wiley Interdisciplinary Reviews: Water, vol. 7, no. 1, p. e1398, 2020

  2. [2]

    An autonomous water monitoring and sampling system for small- sized asvs,

    F. Fornai, G. Ferri, A. Manzi, F. Ciuchi, F. Bartaloni, and C. Laschi, “An autonomous water monitoring and sampling system for small- sized asvs,”IEEE Journal of Oceanic Engineering, vol. 42, no. 1, pp. 5–12, 2017

  3. [3]

    Autonomous water quality monitoring and water surface cleaning for unmanned surface vehicle,

    H.-C. Chang, Y .-L. Hsu, S.-S. Hung, G.-R. Ou, J.-R. Wu, and C. Hsu, “Autonomous water quality monitoring and water surface cleaning for unmanned surface vehicle,”Sensors, vol. 21, no. 4, p. 1102, 2021

  4. [4]

    Time-constrained multiple unmanned surface vehicles cooperation for sea surface oil pollution cleanup,

    X. Zhou, Y . Ge, W. Li, and G. Ye, “Time-constrained multiple unmanned surface vehicles cooperation for sea surface oil pollution cleanup,” in2021 6th International Conference on Robotics and Automation Engineering (ICRAE). IEEE, 2021, pp. 40–45

  5. [5]

    Flow: A dataset and benchmark for floating waste detection in inland waters,

    Y . Cheng, J. Zhu, M. Jiang, J. Fu, C. Pang, P. Wang, K. Sankaran, O. Onabola, Y . Liu, D. Liuet al., “Flow: A dataset and benchmark for floating waste detection in inland waters,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 953–10 962

  6. [6]

    Survey of deep learning for autonomous surface vehicles in marine envi- ronments,

    Y . Qiao, J. Yin, W. Wang, F. Duarte, J. Yang, and C. Ratti, “Survey of deep learning for autonomous surface vehicles in marine envi- ronments,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 4, pp. 3678–3701, 2023

  7. [7]

    A deep reinforcement learning framework and methodology for reducing the sim-to-real gap in asv navigation,

    L. F. W. Batista, J. Ro, A. Richard, P. Schroepfer, S. Hutchinson, and C. Pradalier, “A deep reinforcement learning framework and methodology for reducing the sim-to-real gap in asv navigation,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 1258–1264

  8. [8]

    Using artificial intelligence to support marine macrolitter research: A content analysis and an online database,

    D. V . Politikos, A. Adamopoulou, G. Petasis, and F. Galgani, “Using artificial intelligence to support marine macrolitter research: A content analysis and an online database,”Ocean & Coastal Management, vol. 233, p. 106466, 2023

  9. [9]

    Potato: A dataset for analyzing polarimetric traces of afloat trash objects,

    L. F. W. Batista, S. Khazem, M. Adibi, S. Hutchinson, and C. Pradalier, “Potato: A dataset for analyzing polarimetric traces of afloat trash objects,” inComputer Vision – ECCV 2024 Workshops, A. Del Bue, C. Canton, J. Pont-Tuset, and T. Tommasi, Eds. Cham: Springer Nature Switzerland, 2025, pp. 190–205

  10. [10]

    Roboat III: An autonomous surface vessel for urban trans- portation,

    W. Wang, D. Fern ´andez-Guti´errez, R. Doornbusch, J. Jordan, T. Shan, P. Leoni, N. Hagemann, J. K. Schiphorst, F. Duarte, C. Ratti, and D. Rus, “Roboat III: An autonomous surface vessel for urban trans- portation,”Journal of Field Robotics, vol. 40, no. 8, pp. 1996–2009, Dec. 2023

  11. [11]

    Field testing of a stochastic planner for asv navigation using satellite images,

    P. Huang, T. Wang, F. Shkurti, and T. D. Barfoot, “Field testing of a stochastic planner for asv navigation using satellite images,”IEEE Transactions on Field Robotics, vol. 1, pp. 131–160, 2024

  12. [12]

    Segment 22 VOLUME , anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment 22 VOLUME , anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

  13. [13]

    Robust target interception strategy for a usv with experimental validation,

    B. Lin, W. Xie, Y . Shi, B. Du, C. Zhang, and W. Zhang, “Robust target interception strategy for a usv with experimental validation,” IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7042–7049, 2023

  14. [14]

    An autonomous sailboat for environment monitoring,

    Y .-T. Ang, W.-K. Ng, Y .-W. Chong, J. Wan, S.-Y . Chee, and L. B. Firth, “An autonomous sailboat for environment monitoring,” in2022 Thirteenth International Conference on Ubiquitous and Future Net- works (ICUFN), 2022, pp. 242–246

  15. [15]

    Evaluation of water quality data collected using a novel autonomous surface vessel,

    P. Dash, R. J. Moorhead, J. Herman, W. Beshah, M. Sankar, J. Moor- head, G. D. Chesser, W. Lowe, J. Simmerman, G. Turnageet al., “Evaluation of water quality data collected using a novel autonomous surface vessel,” inOCEANS 2021: San Diego–Porto. IEEE, 2021, pp. 1–10

  16. [16]

    The hydronet asv, a small-sized autonomous catamaran for real-time monitoring of water quality: From design to missions at sea,

    G. Ferri, A. Manzi, F. Fornai, F. Ciuchi, and C. Laschi, “The hydronet asv, a small-sized autonomous catamaran for real-time monitoring of water quality: From design to missions at sea,”IEEE Journal of Oceanic Engineering, vol. 40, no. 3, pp. 710–726, 2015

  17. [17]

    A low-cost autonomous surface vehicle (asv) for plastic waste collection on water bodies,

    G. M. Owusu, J. Kwabena Fosu Okyere, S. K. Armah, and P. Law- erh Kwao, “A low-cost autonomous surface vehicle (asv) for plastic waste collection on water bodies,” in2024 12th International Confer- ence on Control, Mechatronics and Automation (ICCMA), 2024, pp. 41–46

  18. [18]

    An informative planning frame- work for target tracking and active mapping in dynamic environments with asvs.arXiv preprint arXiv:2508.14636,

    S. R. Sudha, M. Popovi ´c, and E. M. Coates, “An informative plan- ning framework for target tracking and active mapping in dynamic environments with asvs,”arXiv preprint arXiv:2508.14636, 2025

  19. [19]

    Op- timizing plastic waste collection in water bodies using heterogeneous autonomous surface vehicles with deep reinforcement learning,

    A. M. Barrionuevo, S. Y . Luis, D. G. Reina, and S. L. T. Mar ´ın, “Op- timizing plastic waste collection in water bodies using heterogeneous autonomous surface vehicles with deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 10, no. 5, pp. 4930–4937, 2025

  20. [20]

    Object detection in 20 years: A survey,

    Z. Zou, K. Chen, Z. Shi, Y . Guo, and J. Ye, “Object detection in 20 years: A survey,”Proceedings of the IEEE, 2023

  21. [21]

    Marine debris detection in real time: A lightweight utnet model,

    J. Cui, S. Zhou, G. Xu, X. Liu, and X. Gao, “Marine debris detection in real time: A lightweight utnet model,”Journal of Marine Science and Engineering, vol. 13, no. 8, 2025

  22. [22]

    Deep learning for detecting macroplastic litter in water bodies: a review,

    T. Jia, Z. Kapelan, R. de Vries, P. Vriend, E. C. Peereboom, I. Okker- man, and R. Taormina, “Deep learning for detecting macroplastic litter in water bodies: a review,”Water Research, p. 119632, 2023

  23. [23]

    Survey on deep learning-based marine object detection,

    R. Zhang, S. Li, G. Ji, X. Zhao, J. Li, and M. Pan, “Survey on deep learning-based marine object detection,”Journal of Advanced Transportation, vol. 2021, pp. 1–18, 2021

  24. [24]

    Object detection and classification for small objects in/on water,

    A. Iqbal, M. G. Garcia, L. Chellappan, and N. Gans, “Object detection and classification for small objects in/on water,”Journal of Electronic Imaging, vol. 31, no. 3, pp. 033 041–033 041, 2022

  25. [25]

    Adapted learning for polarization-based car detection,

    R. Blin, S. Ainouz, S. Canu, and F. Meriaudeau, “Adapted learning for polarization-based car detection,” inFourteenth International Confer- ence on Quality Control by Artificial Vision, vol. 11172. SPIE, 2019, pp. 312–318

  26. [26]

    Multimodal polarimetric and color fusion for road scene analysis in adverse weather conditions,

    ——, “Multimodal polarimetric and color fusion for road scene analysis in adverse weather conditions,” in2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 3338–3342

  27. [27]

    Evaluation of polari- metric fusion for semantic segmentation in aquatic environments,

    L. F. W. Batista, T. Bourbon, and C. Pradalier, “Evaluation of polari- metric fusion for semantic segmentation in aquatic environments,” in 2025 International Conference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5

  28. [28]

    Anymal parkour: Learning agile navigation for quadrupedal robots,

    D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,”Science Robotics, vol. 9, no. 88, p. eadi7566, 2024

  29. [29]

    Real-world humanoid locomotion with reinforcement learning,

    I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Science Robotics, vol. 9, no. 89, p. eadi9579, 2024

  30. [30]

    Champion-level drone racing using deep reinforce- ment learning,

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforce- ment learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023

  31. [31]

    Path following optimization for an underactuated usv using smoothly- convergent deep reinforcement learning,

    Y . Zhao, X. Qi, Y . Ma, Z. Li, R. Malekian, and M. A. Sotelo, “Path following optimization for an underactuated usv using smoothly- convergent deep reinforcement learning,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 22, no. 10, pp. 6208–6220, 2020

  32. [32]

    Robust unmanned surface vehicle navigation with distributional reinforcement learning,

    X. Lin, J. McConnell, and B. Englot, “Robust unmanned surface vehicle navigation with distributional reinforcement learning,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 6185–6191

  33. [33]

    Model-reference reinforcement learning for collision-free tracking control of autonomous surface vehicles,

    Q. Zhang, W. Pan, and V . Reppa, “Model-reference reinforcement learning for collision-free tracking control of autonomous surface vehicles,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8770–8781, 2021

  34. [34]

    Reinforcement learning- based optimal tracking control of an unknown unmanned surface ve- hicle,

    N. Wang, Y . Gao, H. Zhao, and C. K. Ahn, “Reinforcement learning- based optimal tracking control of an unknown unmanned surface ve- hicle,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 7, pp. 3034–3045, 2020

  35. [35]

    Reinforcement learning- based finite-time tracking control of an unknown unmanned surface vehicle with input constraints,

    N. Wang, Y . Gao, C. Yang, and X. Zhang, “Reinforcement learning- based finite-time tracking control of an unknown unmanned surface vehicle with input constraints,”Neurocomputing, vol. 484, pp. 26–37, 2022

  36. [36]

    Path following control for unmanned surface vehicles: A reinforcement learning-based method with experimental validation,

    Y . Wang, J. Cao, J. Sun, X. Zou, and C. Sun, “Path following control for unmanned surface vehicles: A reinforcement learning-based method with experimental validation,”IEEE Transactions on Neural Networks and Learning Systems, 2023

  37. [37]

    Evaluating robustness of deep reinforcement learning for autonomous surface vehicle control in field tests,

    L. F. W. Batista, S. Aravecchia, S. Hutchinson, and C. Pradalier, “Evaluating robustness of deep reinforcement learning for autonomous surface vehicle control in field tests,” 2025

  38. [38]

    Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,

    W. Wang, X. Cao, A. Gonzalez-Garcia, L. Yin, N. Hagemann, Y . Qiao, C. Ratti, and D. Rus, “Deep reinforcement learning based tracking control of an autonomous surface vessel in natural waters,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3109–3115

  39. [39]

    Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,

    J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,”Ocean Engineering, vol. 183, pp. 155–166, 2019

  40. [40]

    Deep reinforcement learning based controller for ship navigation,

    R. Deraj, R. S. Kumar, M. S. Alam, and A. Somayajula, “Deep reinforcement learning based controller for ship navigation,”Ocean Engineering, vol. 273, p. 113937, 2023

  41. [41]

    Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,

    T. Slawik, B. Wehbe, L. Christensen, and F. Kirchner, “Deep re- inforcement learning for path-following control of an autonomous surface vehicle using domain randomization,”IFAC-PapersOnLine, vol. 58, no. 20, pp. 21–26, 2024, 15th IFAC Conference on Control Applications in Marine Systems, Robotics and Vehicles CAMS 2024

  42. [42]

    Drift: Deep reinforcement learning for intelligent floating platforms trajectories,

    M. El-Hariry, A. Richard, V . Muralidharan, M. Geist, and M. Olivares- Mendez, “Drift: Deep reinforcement learning for intelligent floating platforms trajectories,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 14 034–14 041

  43. [43]

    A review of uncertainty for deep re- inforcement learning,

    O. Lockwood and M. Si, “A review of uncertainty for deep re- inforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2022), vol. 18, no. 1. AAAI Press, 2022, pp. 155–162

  44. [44]

    Robust deep reinforcement learning against adversarial perturbations on state observations,

    H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 21 024– 21 037

  45. [45]

    A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

    L. Da, J. Turnau, T. P. Kutralingam, A. Velasquez, P. Shakarian, and H. Wei, “A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models,” Mar. 2025, arXiv:2502.13187 [cs]

  46. [46]

    Sim2real in robotics and automa- tion: Applications and challenges,

    S. H ¨ofer, K. Bekris, A. Handa, J. C. Gamboa, M. Mozifian, F. Golemo, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, C. Karen Liu, J. Peters, S. Song, P. Welinder, and M. White, “Sim2real in robotics and automa- tion: Applications and challenges,”IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 398–400, 2021

  47. [47]

    Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,

    E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,”IEEE Access, vol. 9, pp. 153 171–153 187, 2021

  48. [48]

    How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,

    X. Hu, S. Li, T. Huang, B. Tang, R. Huai, and L. Chen, “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 593–612, 2023

  49. [49]

    A sim-to-real transfer framework for enhancing marine vehicle performance in ocean environments,

    Z. Zheng, Z. Wang, and W. Xie, “A sim-to-real transfer framework for enhancing marine vehicle performance in ocean environments,” in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 1558–1565

  50. [50]

    Ros: an open-source robot operating system,

    M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y . Nget al., “Ros: an open-source robot operating system,” inICRA workshop on open source software, vol. 3, no. 3.2. Kobe, 2009, p. 5. VOLUME , 23 :

  51. [51]

    Toward maritime robotic sim- ulation in gazebo,

    B. Bingham, C. Ag ¨uero, M. McCarrin, J. Klamo, J. Malia, K. Allen, T. Lum, M. Rawson, and R. Waqar, “Toward maritime robotic sim- ulation in gazebo,” inOCEANS 2019 MTS/IEEE SEATTLE. IEEE, 2019, pp. 1–10

  52. [52]

    Orbit: A unified simulation framework for interactive robot learning environments,

    M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

  53. [53]

    Isaac gym: High performance gpu-based physics simulation for robot learning,

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handaet al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” 2021

  54. [54]

    ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation,

    Ultralytics, “ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation,” 2022. [Online]. Available: https://github.com/ ultralytics/yolov5

  55. [55]

    Szeliski,Computer vision: algorithms and applications

    R. Szeliski,Computer vision: algorithms and applications. Springer Nature, 2022

  56. [56]

    M. L. Puterman,Markov decision processes: discrete stochastic dy- namic programming. John Wiley & Sons, 2014

  57. [57]

    Proximal Policy Optimization Algorithms,

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” 2017

  58. [58]

    T. I. Fossen,Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons, 2011. 24 VOLUME ,