Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

Elisei Shafer; Oren Gal

arxiv: 2606.08513 · v1 · pith:UB2BN73Vnew · submitted 2026-06-07 · 💻 cs.RO · cs.LG· cs.SY· eess.SY

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

Elisei Shafer , Oren Gal This is my paper

Pith reviewed 2026-06-27 18:31 UTC · model grok-4.3

classification 💻 cs.RO cs.LGcs.SYeess.SY

keywords autonomous underwater vehiclesreinforcement learninghierarchical policiesend-to-end controlobstacle avoidancesensor noise robustnesssimulation evaluation

0 comments

The pith

A hierarchical reinforcement learning system maps raw AUV sensor data directly to thruster commands for obstacle avoidance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether deep reinforcement learning can replace the usual separate modules for perception, planning, and control in underwater vehicles. It splits the task into a high-level policy that reads camera images, sonar, and position data to pick nearby subgoals and a low-level policy that turns those subgoals into motor commands. The high-level part learns from a few demonstrations while the low-level part uses experience replay. In simulation the resulting policy steers around obstacles on paths nearly as short as those from a standard planner and keeps working when sensors are noisy or visibility drops. The approach works well on shapes the vehicle has seen before but has trouble with new obstacle forms.

Core claim

The central claim is that an end-to-end hierarchical DRL controller, trained with RLPD in a modified SERL framework for the high-level policy and SAC plus HER for the low-level policy, produces obstacle-avoiding trajectories whose lengths stay within 4 to 6 percent of an RRT* baseline while remaining robust to added sensor noise and reduced visibility.

What carries the argument

The two-layer hierarchical policy that runs a 2 Hz high-level network on stacked 84x84 camera and 100x100 sonar images plus proprioception to output spatial subgoals and a 10 Hz low-level network that converts those subgoals into thruster actions.

If this is right

Obstacle avoidance succeeds with paths close in length to those produced by RRT* planning.
The learned behavior tolerates simulated sensor noise and lowered visibility without retraining.
Sample-efficient training is possible by combining prior demonstrations for the high-level policy with hindsight replay for the low-level policy.
Navigation remains reliable on obstacle geometries encountered during training but degrades on novel shapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation-to-real gap can be closed, the same architecture could cut the amount of custom software needed when moving an AUV to a new site.
Adding more sensor types or running multiple vehicles together would be a direct next test of the same subgoal-passing structure.
Collecting a small set of real-world demonstration trajectories could be enough to fine-tune the high-level policy for better generalization.

Load-bearing premise

Results obtained inside the HoloOcean simulator with its particular sensor and noise models will carry over to real AUV hardware and real ocean settings.

What would settle it

Deploy the trained policy on physical AUV hardware in an instrumented tank or open water with obstacles and measure whether trajectory lengths and collision rates match the simulator numbers within the same 4-6 percent band.

Figures

Figures reproduced from arXiv: 2606.08513 by Elisei Shafer, Oren Gal.

**Figure 3.** Figure 3: Human expert demonstration acquisition. The human [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Comparison of trajectories in seen vs. unseen environ [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: RRT* planner and PD waypoint follower. 2) Robustness to Noise: Noise was added to test sensor robustness. Each sensor, except for FLS, is modeled with Gaussian white noise η ∼ N (0, σ2 ) applied to the ground truth measurements. The specific noise standard deviations (σ) and configurations are detailed in Table III [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of Forward-Looking Sonar (FLS) output [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of two levels of fog [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Test with goals in seen areas and sensor noise. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision Processes. A High-Level (HL) policy operating at 2Hz processes raw $84 \times 84$ pixel monocular camera frames, stacked $100 \times 100$ pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Simultaneously, a Low-Level (LL) policy operating at 10Hz converts these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy utilizes Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the high-fidelity HoloOcean simulator, our method demonstrates successful obstacle avoidance, achieving trajectory lengths closely approximating (within 4% to 6% of) an $\text{RRT}^*$ planning baseline. Furthermore, the learned policy exhibits strong robustness to simulated sensor noise and decreased visibility. While the system navigates familiar geometries effectively, experiments reveal generalization limitations when encountering unvisited areas with novel obstacle shapes. Ultimately, this work demonstrates the promise of sample-efficient, end-to-end DRL for underwater navigation using minimal computational hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hierarchical RL gets AUV trajectories within 4-6% of RRT* in HoloOcean but the entire claim rests on simulation with no real-world or transfer evidence.

read the letter

The main takeaway is that this hierarchical setup—2 Hz high-level policy on 84x84 camera plus 100x100 sonar feeding subgoals to a 10 Hz low-level SAC+HER policy—produces obstacle-avoiding paths in the simulator that stay close to an RRT* baseline while showing some noise robustness.

What is actually new is the specific application: RLPD inside a modified SERL framework for the high level, combined with camera-sonar fusion and proprioception for AUVs. Prior work on these RL pieces exists, but the paper puts them together for this sensor stack and task in a way that is not already described in the cited literature. The simulation results are concrete on trajectory length and noise tolerance, which is the part that works.

The soft spots are straightforward. All numbers come from HoloOcean with no hardware tests, no domain randomization ablations, and no discussion of hydrodynamic or thruster effects that would matter on real AUVs. Generalization to novel shapes is noted as limited but not analyzed. Training details and statistical reporting are thin. The sim-to-real gap is load-bearing for the stated goal of cutting engineering pipelines, and it is not closed.

This is for readers working on RL applications in robotics who want to see an AUV example in simulation. It is not yet useful for anyone needing deployment evidence. The work shows clear thinking on the architecture and honest reporting of the sim limits, so it deserves a serious referee to check the training setup and ask for transfer experiments.

Referee Report

2 major / 0 minor

Summary. The paper proposes a hierarchical deep reinforcement learning (DRL) architecture for end-to-end AUV navigation that maps raw sensor inputs (84x84 monocular camera, 100x100 sonar, proprioception) directly to thruster commands. A high-level policy at 2 Hz generates spatial subgoals via RLPD within a modified SERL framework; a low-level policy at 10 Hz converts subgoals to actions via SAC+HER. Evaluated solely in the HoloOcean simulator, the method achieves obstacle avoidance with trajectory lengths within 4-6% of an RRT* baseline and shows robustness to simulated sensor noise and reduced visibility, while noting limited generalization to novel obstacle shapes.

Significance. If the simulation results prove transferable, the work would demonstrate a viable path toward reducing heavily engineered perception-planning-control pipelines for AUVs through sample-efficient hierarchical DRL. The independent RRT* benchmarking and explicit acknowledgment of generalization limits are strengths; however, the absence of real-world validation or domain-randomization studies limits immediate impact on hardware deployment.

major comments (2)

[Abstract] Abstract and Evaluation section: All headline performance claims (trajectory lengths within 4-6% of RRT*, robustness to sensor noise) rest exclusively on HoloOcean simulation with the described 84x84 camera + 100x100 sonar models. The central goal of reducing manual engineering pipelines requires that these results survive transfer to real AUV hardware, yet no real-world experiments, hardware-in-the-loop tests, domain-randomization ablations, or analysis of unmodeled hydrodynamic/thruster effects are provided to support this assumption.
[Abstract] Abstract: The training procedures, hyperparameters, and statistical significance of the reported trajectory-length and robustness metrics receive limited detail, making it difficult to assess reproducibility or the precise contribution of the RLPD/SERL and SAC+HER components.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. Our work presents a simulation-based study of hierarchical DRL for AUV navigation in HoloOcean, with explicit discussion of its scope and limitations. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and Evaluation section: All headline performance claims (trajectory lengths within 4-6% of RRT*, robustness to sensor noise) rest exclusively on HoloOcean simulation with the described 84x84 camera + 100x100 sonar models. The central goal of reducing manual engineering pipelines requires that these results survive transfer to real AUV hardware, yet no real-world experiments, hardware-in-the-loop tests, domain-randomization ablations, or analysis of unmodeled hydrodynamic/thruster effects are provided to support this assumption.

Authors: We agree the evaluation is confined to high-fidelity simulation and that real-world transfer remains an open question for hardware deployment. The manuscript already states the generalization limits to novel obstacle shapes. As a feasibility demonstration of end-to-end DRL in simulation (benchmarked against RRT*), we do not include real-world or domain-randomization experiments, which would require physical hardware access beyond the current scope. The simulation results still illustrate the potential to reduce engineered pipelines under the modeled conditions. revision: no
Referee: [Abstract] Abstract: The training procedures, hyperparameters, and statistical significance of the reported trajectory-length and robustness metrics receive limited detail, making it difficult to assess reproducibility or the precise contribution of the RLPD/SERL and SAC+HER components.

Authors: We will expand the methods and evaluation sections to provide the full set of training hyperparameters for RLPD within the modified SERL framework and for SAC+HER, along with details on the training procedures and statistical analysis (means and standard deviations across multiple random seeds) for the trajectory-length and robustness metrics. This will improve reproducibility and clarify component contributions. revision: yes

standing simulated objections not resolved

Absence of real-world experiments, hardware-in-the-loop tests, or domain-randomization studies, as these require physical AUV hardware and resources outside the simulation-focused scope of the manuscript.

Circularity Check

0 steps flagged

No significant circularity; evaluation uses independent RRT* baseline and standard RL methods.

full rationale

The paper describes an empirical hierarchical RL architecture (HL policy at 2 Hz on camera/sonar/proprioception, LL at 10 Hz via SAC+HER) trained with RLPD/SERL and evaluated in HoloOcean simulator. Trajectory lengths are compared to an external RRT* planner (within 4-6%), with no equations, fitted parameters, or self-citations that reduce the central performance claims to inputs by construction. The derivation chain consists of standard algorithmic choices and simulator-based benchmarking against an independent baseline; no self-definitional, fitted-input, or uniqueness-imported steps appear. This is the common honest finding for simulation-only RL papers whose metrics do not loop back on themselves.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract does not detail specific free parameters or axioms beyond standard RL assumptions; reward design and MDP formulation are implicit.

free parameters (1)

reward function weights
Standard in RL but not specified; likely tuned for the task.

axioms (1)

domain assumption The environment can be modeled as two separate MDPs for high and low level policies.
Invoked in the hierarchical architecture description.

pith-pipeline@v0.9.1-grok · 5826 in / 1316 out tokens · 38102 ms · 2026-06-27T18:31:45.983943+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 13 canonical work pages

[1]

Champion-level drone racing using deep reinforcement learning,

E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023

2023
[2]

Day- dreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Day- dreamer: World models for physical robot learning,” inConference on robot learning. PMLR, 2023, pp. 2226–2240

2023
[3]

Serl: A software suite for sample-efficient robotic reinforcement learning,

J. Luo, Z. Hu, C. Xu, Y . L. Tan, J. Berg, A. Sharma, S. Schaal, C. Finn, A. Gupta, and S. Levine, “Serl: A software suite for sample-efficient robotic reinforcement learning,” 2024

2024
[4]

Pipeline inspection with AUV,

V . H. Fernandes, A. A. Neto, and D. D. Rodrigues, “Pipeline inspection with AUV,” in2015 IEEE/OES Acoustics in Underwater Geosciences Symposium (RIO Acoustics). IEEE, 7 2015, pp. 1–5. [Online]. Available: http://dx.doi.org/10.1109/RIOACOUSTICS.2015.7473607

work page doi:10.1109/rioacoustics.2015.7473607 2015
[5]

An Autonomous Underwater Vehicle Simulation With Fuzzy Sensor Fusion for Pipeline Inspection,

I.-C. Sang and W. R. Norris, “An Autonomous Underwater Vehicle Simulation With Fuzzy Sensor Fusion for Pipeline Inspection,”IEEE Sensors Journal, vol. 23, no. 8, pp. 8941–8951, apr 15 2023. [Online]. Available: http://dx.doi.org/10.1109/JSEN.2023.3250721

work page doi:10.1109/jsen.2023.3250721 2023
[6]

Autonomous Underwater Vehicle navigation: A review,

B. Zhang, D. Ji, S. Liu, X. Zhu, and W. Xu, “Autonomous Underwater Vehicle navigation: A review,”Ocean Engineering, vol. 273, p. 113861, 4 2023. [Online]. Available: http://dx.doi.org/10.1016/j.oceaneng.2023.113861

work page doi:10.1016/j.oceaneng.2023.113861 2023
[7]

An Active Perception Framework for Autonomous Underwater Vehicle Navigation Under Sensor Constraints,

D. Chang, M. Johnson-Roberson, and J. Sun, “An Active Perception Framework for Autonomous Underwater Vehicle Navigation Under Sensor Constraints,”IEEE Transactions on Control Systems Technology, vol. 30, no. 6, pp. 2301–2316, 11 2022. [Online]. Available: http://dx.doi.org/10.1109/TCST.2021.3139307

work page doi:10.1109/tcst.2021.3139307 2022
[8]

Xiong, Q

M. Xanthidis, M. Kalaitzakis, N. Karapetyan, J. Johnson, N. Vitzilaios, J. M. O’Kane, and I. Rekleitis, “Aquavis: A Perception-Aware Autonomous Navigation Framework for Underwater Vehicles,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, sep 27 2021, pp. 5410–5417. [Online]. Available: http://dx.doi.org/10.1109/IR...

work page doi:10.1109/iros51168.2021.9636124 2021
[9]

Autonomous Underwater Navigation and Optical Mapping in Unknown Natural Environments,

J. Hern ´andez, K. Isteni ˇc, N. Gracias, N. Palomeras, R. Campos, E. Vidal, R. Garc ´ıa, and M. Carreras, “Autonomous Underwater Navigation and Optical Mapping in Unknown Natural Environments,” Sensors, vol. 16, no. 8, p. 1174, jul 26 2016. [Online]. Available: http://dx.doi.org/10.3390/s16081174

work page doi:10.3390/s16081174 2016
[10]

Advancements in Sensor Fusion for Underwater SLAM: A Review on Enhanced Navigation and Environmental Perception,

F. F. R. Merveille, B. Jia, Z. Xu, and B. Fred, “Advancements in Sensor Fusion for Underwater SLAM: A Review on Enhanced Navigation and Environmental Perception,”Sensors, vol. 24, no. 23, p. 7490, nov 24
[11]

Available: http://dx.doi.org/10.3390/s24237490

[Online]. Available: http://dx.doi.org/10.3390/s24237490

work page doi:10.3390/s24237490
[12]

A review of sensor fusion techniques for underwater vehicle navigation,

T. Nicosevici, R. Garcia, M. Carreras, and M. Villanueva, “A review of sensor fusion techniques for underwater vehicle navigation,” inOceans ’04 MTS/IEEE Techno-Ocean ’04 (IEEE Cat. No.04CH37600), vol. 3. IEEE, 2004, pp. 1600–1605. [Online]. Available: http://dx.doi.org/10.1109/OCEANS.2004.1406361

work page doi:10.1109/oceans.2004.1406361 2004
[13]

Visually augmented navigation in an unstructured environment using a delayed state history,

R. Eustice, O. Pizarro, and H. Singh, “Visually augmented navigation in an unstructured environment using a delayed state history,”IEEE Inter- national Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, 2004

2004
[14]

Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs,

Y . Noguchi and T. Maki, “Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs,” in2019 IEEE Underwater Technology (UT). IEEE, 4 2019, pp. 1–6. [Online]. Available: http://dx.doi.org/10.1109/UT.2019.8734314

work page doi:10.1109/ut.2019.8734314 2019
[15]

A Multi-Source- Data-Assisted AUV for Path Cruising: An Energy-Efficient DDPG Approach,

T. Xing, X. Wang, K. Ding, K. Ni, and Q. Zhou, “A Multi-Source- Data-Assisted AUV for Path Cruising: An Energy-Efficient DDPG Approach,”Remote Sensing, vol. 15, no. 23, p. 5607, dec 2 2023. [Online]. Available: http://dx.doi.org/10.3390/rs15235607

work page doi:10.3390/rs15235607 2023
[16]

Auv obstacle avoidance planning based on deep reinforcement learning,

J. Yuan, H. Wang, H. Zhang, C. Lin, D. Yu, and C. Li, “Auv obstacle avoidance planning based on deep reinforcement learning,”Journal of Marine Science and Engineering, vol. 9, no. 11, p. 1166, 2021

2021
[17]

Comprehensive Ocean Information-Enabled AUV Motion Planning Based on Reinforcement Learning,

Y . Li, X. He, Z. Lu, P. Jing, and Y . Su, “Comprehensive Ocean Information-Enabled AUV Motion Planning Based on Reinforcement Learning,”Remote Sensing, vol. 15, no. 12, p. 3077, jun 12 2023. [Online]. Available: http://dx.doi.org/10.3390/rs15123077

work page doi:10.3390/rs15123077 2023
[18]

Flightgoggles: Photorealistic sensor simulation for perception -driven robotics using photogrammetry and virtual reality

H. Wu, S. Song, Y . Hsu, K. You, and C. Wu, “End-to-end sensorimotor control problems of AUVs with deep reinforcement learning,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11 2019, pp. 5869–5874. [Online]. Available: http://dx.doi.org/10.1109/IROS40897.2019.8967612

work page doi:10.1109/iros40897.2019.8967612 2019
[19]

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles,

L. Cai, K. Chang, and Y . Girdhar, “Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles,”arXiv.org, 2024. [Online]. Available: https://arxiv.org/abs/2410.00120

arXiv 2024
[20]

End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning,

X. Lyu, Y . Sun, L. Wang, J. Tan, and L. Zhang, “End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning,” Journal of Marine Science and Engineering, vol. 11, no. 9, p. 1796, sep 14 2023. [Online]. Available: http://dx.doi.org/10.3390/jmse11091796

work page doi:10.3390/jmse11091796 2023
[21]

Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning,

Z. Tang, X. Cao, Z. Zhou, Z. Zhang, C. Xu, and J. Dou, “Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning,”Ocean Engineering, vol. 301, p. 117547, 2024

2024
[22]

Efficient online reinforcement learning with offline data,

P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient online reinforcement learning with offline data,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 1577–1594

2023
[23]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. PMLR, 2018, pp. 1861–1870

2018
[24]

Hindsight experience replay,

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,”Advances in neural information processing systems, vol. 30, 2017

2017
[25]

Stable-baselines3: Reliable reinforcement learning implementations,

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20- 1364.html

2021
[26]

Holoocean: Realistic sonar simulation,

E. Potokar, K. Lay, K. Norman, D. Benham, T. B. Neilsen, M. Kaess, and J. G. Mangelson, “Holoocean: Realistic sonar simulation,” in2022 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2022, pp. 8450–8456

2022

[1] [1]

Champion-level drone racing using deep reinforcement learning,

E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,”Nature, vol. 620, no. 7976, pp. 982–987, 2023

2023

[2] [2]

Day- dreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Day- dreamer: World models for physical robot learning,” inConference on robot learning. PMLR, 2023, pp. 2226–2240

2023

[3] [3]

Serl: A software suite for sample-efficient robotic reinforcement learning,

J. Luo, Z. Hu, C. Xu, Y . L. Tan, J. Berg, A. Sharma, S. Schaal, C. Finn, A. Gupta, and S. Levine, “Serl: A software suite for sample-efficient robotic reinforcement learning,” 2024

2024

[4] [4]

Pipeline inspection with AUV,

V . H. Fernandes, A. A. Neto, and D. D. Rodrigues, “Pipeline inspection with AUV,” in2015 IEEE/OES Acoustics in Underwater Geosciences Symposium (RIO Acoustics). IEEE, 7 2015, pp. 1–5. [Online]. Available: http://dx.doi.org/10.1109/RIOACOUSTICS.2015.7473607

work page doi:10.1109/rioacoustics.2015.7473607 2015

[5] [5]

An Autonomous Underwater Vehicle Simulation With Fuzzy Sensor Fusion for Pipeline Inspection,

I.-C. Sang and W. R. Norris, “An Autonomous Underwater Vehicle Simulation With Fuzzy Sensor Fusion for Pipeline Inspection,”IEEE Sensors Journal, vol. 23, no. 8, pp. 8941–8951, apr 15 2023. [Online]. Available: http://dx.doi.org/10.1109/JSEN.2023.3250721

work page doi:10.1109/jsen.2023.3250721 2023

[6] [6]

Autonomous Underwater Vehicle navigation: A review,

B. Zhang, D. Ji, S. Liu, X. Zhu, and W. Xu, “Autonomous Underwater Vehicle navigation: A review,”Ocean Engineering, vol. 273, p. 113861, 4 2023. [Online]. Available: http://dx.doi.org/10.1016/j.oceaneng.2023.113861

work page doi:10.1016/j.oceaneng.2023.113861 2023

[7] [7]

An Active Perception Framework for Autonomous Underwater Vehicle Navigation Under Sensor Constraints,

D. Chang, M. Johnson-Roberson, and J. Sun, “An Active Perception Framework for Autonomous Underwater Vehicle Navigation Under Sensor Constraints,”IEEE Transactions on Control Systems Technology, vol. 30, no. 6, pp. 2301–2316, 11 2022. [Online]. Available: http://dx.doi.org/10.1109/TCST.2021.3139307

work page doi:10.1109/tcst.2021.3139307 2022

[8] [8]

Xiong, Q

M. Xanthidis, M. Kalaitzakis, N. Karapetyan, J. Johnson, N. Vitzilaios, J. M. O’Kane, and I. Rekleitis, “Aquavis: A Perception-Aware Autonomous Navigation Framework for Underwater Vehicles,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, sep 27 2021, pp. 5410–5417. [Online]. Available: http://dx.doi.org/10.1109/IR...

work page doi:10.1109/iros51168.2021.9636124 2021

[9] [9]

Autonomous Underwater Navigation and Optical Mapping in Unknown Natural Environments,

J. Hern ´andez, K. Isteni ˇc, N. Gracias, N. Palomeras, R. Campos, E. Vidal, R. Garc ´ıa, and M. Carreras, “Autonomous Underwater Navigation and Optical Mapping in Unknown Natural Environments,” Sensors, vol. 16, no. 8, p. 1174, jul 26 2016. [Online]. Available: http://dx.doi.org/10.3390/s16081174

work page doi:10.3390/s16081174 2016

[10] [10]

Advancements in Sensor Fusion for Underwater SLAM: A Review on Enhanced Navigation and Environmental Perception,

F. F. R. Merveille, B. Jia, Z. Xu, and B. Fred, “Advancements in Sensor Fusion for Underwater SLAM: A Review on Enhanced Navigation and Environmental Perception,”Sensors, vol. 24, no. 23, p. 7490, nov 24

[11] [11]

Available: http://dx.doi.org/10.3390/s24237490

[Online]. Available: http://dx.doi.org/10.3390/s24237490

work page doi:10.3390/s24237490

[12] [12]

A review of sensor fusion techniques for underwater vehicle navigation,

T. Nicosevici, R. Garcia, M. Carreras, and M. Villanueva, “A review of sensor fusion techniques for underwater vehicle navigation,” inOceans ’04 MTS/IEEE Techno-Ocean ’04 (IEEE Cat. No.04CH37600), vol. 3. IEEE, 2004, pp. 1600–1605. [Online]. Available: http://dx.doi.org/10.1109/OCEANS.2004.1406361

work page doi:10.1109/oceans.2004.1406361 2004

[13] [13]

Visually augmented navigation in an unstructured environment using a delayed state history,

R. Eustice, O. Pizarro, and H. Singh, “Visually augmented navigation in an unstructured environment using a delayed state history,”IEEE Inter- national Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, 2004

2004

[14] [14]

Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs,

Y . Noguchi and T. Maki, “Path Planning Method Based on Artificial Potential Field and Reinforcement Learning for Intervention AUVs,” in2019 IEEE Underwater Technology (UT). IEEE, 4 2019, pp. 1–6. [Online]. Available: http://dx.doi.org/10.1109/UT.2019.8734314

work page doi:10.1109/ut.2019.8734314 2019

[15] [15]

A Multi-Source- Data-Assisted AUV for Path Cruising: An Energy-Efficient DDPG Approach,

T. Xing, X. Wang, K. Ding, K. Ni, and Q. Zhou, “A Multi-Source- Data-Assisted AUV for Path Cruising: An Energy-Efficient DDPG Approach,”Remote Sensing, vol. 15, no. 23, p. 5607, dec 2 2023. [Online]. Available: http://dx.doi.org/10.3390/rs15235607

work page doi:10.3390/rs15235607 2023

[16] [16]

Auv obstacle avoidance planning based on deep reinforcement learning,

J. Yuan, H. Wang, H. Zhang, C. Lin, D. Yu, and C. Li, “Auv obstacle avoidance planning based on deep reinforcement learning,”Journal of Marine Science and Engineering, vol. 9, no. 11, p. 1166, 2021

2021

[17] [17]

Comprehensive Ocean Information-Enabled AUV Motion Planning Based on Reinforcement Learning,

Y . Li, X. He, Z. Lu, P. Jing, and Y . Su, “Comprehensive Ocean Information-Enabled AUV Motion Planning Based on Reinforcement Learning,”Remote Sensing, vol. 15, no. 12, p. 3077, jun 12 2023. [Online]. Available: http://dx.doi.org/10.3390/rs15123077

work page doi:10.3390/rs15123077 2023

[18] [18]

Flightgoggles: Photorealistic sensor simulation for perception -driven robotics using photogrammetry and virtual reality

H. Wu, S. Song, Y . Hsu, K. You, and C. Wu, “End-to-end sensorimotor control problems of AUVs with deep reinforcement learning,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11 2019, pp. 5869–5874. [Online]. Available: http://dx.doi.org/10.1109/IROS40897.2019.8967612

work page doi:10.1109/iros40897.2019.8967612 2019

[19] [19]

Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles,

L. Cai, K. Chang, and Y . Girdhar, “Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles,”arXiv.org, 2024. [Online]. Available: https://arxiv.org/abs/2410.00120

arXiv 2024

[20] [20]

End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning,

X. Lyu, Y . Sun, L. Wang, J. Tan, and L. Zhang, “End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning,” Journal of Marine Science and Engineering, vol. 11, no. 9, p. 1796, sep 14 2023. [Online]. Available: http://dx.doi.org/10.3390/jmse11091796

work page doi:10.3390/jmse11091796 2023

[21] [21]

Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning,

Z. Tang, X. Cao, Z. Zhou, Z. Zhang, C. Xu, and J. Dou, “Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning,”Ocean Engineering, vol. 301, p. 117547, 2024

2024

[22] [22]

Efficient online reinforcement learning with offline data,

P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient online reinforcement learning with offline data,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 1577–1594

2023

[23] [23]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. PMLR, 2018, pp. 1861–1870

2018

[24] [24]

Hindsight experience replay,

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,”Advances in neural information processing systems, vol. 30, 2017

2017

[25] [25]

Stable-baselines3: Reliable reinforcement learning implementations,

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,”Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20- 1364.html

2021

[26] [26]

Holoocean: Realistic sonar simulation,

E. Potokar, K. Lay, K. Norman, D. Benham, T. B. Neilsen, M. Kaess, and J. G. Mangelson, “Holoocean: Realistic sonar simulation,” in2022 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2022, pp. 8450–8456

2022