ReefMapGS: Enabling Large-Scale Underwater Reconstruction by Closing the Loop Between Multimodal SLAM and Gaussian Splatting
Pith reviewed 2026-05-10 15:37 UTC · model grok-4.3
The pith
ReefMapGS closes the loop between multimodal SLAM and 3D Gaussian Splatting to produce large-scale underwater reef models without structure-from-motion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReefMapGS builds an initial model from a high certainty region and progressively expands to incorporate the whole scene by interleaving local tracking of new image observations with optimization of the underlying 3DGS scene; these refined poses are integrated back into the pose-graph to globally optimize the whole trajectory, resulting in COLMAP-free 3D reconstruction of underwater reef sites with complex geometry as well as more accurate global pose estimation of the AUV over survey trajectories spanning up to 700 m.
What carries the argument
The closed feedback loop in which 3D Gaussian Splatting optimization refines camera poses that are then used to update the multimodal pose-graph, allowing incremental expansion from an initial high-certainty seed region.
If this is right
- Underwater sites with complex geometry can be reconstructed at field scale without offline structure-from-motion processing.
- AUV pose estimates improve in global consistency when visual scene optimization is interleaved with multimodal pose-graph updates.
- Incremental model growth supports continuous operation over trajectories hundreds of meters long.
- Multimodal uncertainty estimates from the SLAM layer become usable for guiding the Gaussian Splatting optimization.
Where Pith is reading between the lines
- The same interleaving pattern could be tested in other sensor-rich but visually degraded settings such as murky water or low-light caves.
- Quantitative comparison against ground-truth trajectories on additional reef datasets would clarify how much the feedback loop reduces drift.
- Extending the method to streaming reconstruction might allow an AUV to maintain an up-to-date map during a single dive.
- The framework suggests that dense visual representations can serve as an external consistency check for any pose-graph SLAM system.
Load-bearing premise
That poses refined by 3D Gaussian Splatting can be fed back into the pose-graph optimizer to increase global consistency without causing divergence or new drift under variable underwater lighting and geometry.
What would settle it
A full reef survey trajectory processed end-to-end where the final global trajectory error or reconstruction completeness is worse after the pose feedback step than when using only the original multimodal SLAM poses.
Figures
read the original abstract
3D Gaussian Splatting is a powerful visual representation, providing high-quality and efficient 3D scene reconstruction, but it is crucially dependent on accurate camera poses typically obtained from computationally intensive processes like structure-from-motion that are unsuitable for field robot applications. However, in these domains, multimodal sensor data from acoustic, inertial, pressure, and visual sensors are available and suitable for pose-graph optimization-based SLAM methods that can estimate the vehicle's trajectory and thus our needed camera poses while providing uncertainty. We propose a 3DGS-based incremental reconstruction framework, ReefMapGS, that builds an initial model from a high certainty region and progressively expands to incorporate the whole scene. We reconstruct the scene incrementally by interleaving local tracking of new image observations with optimization of the underlying 3DGS scene. These refined poses are integrated back into the pose-graph to globally optimize the whole trajectory. We show COLMAP-free 3D reconstruction of two underwater reef sites with complex geometry as well as more accurate global pose estimation of our AUV over survey trajectories spanning up to 700 m.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ReefMapGS, a framework for large-scale underwater 3D reconstruction that combines multimodal SLAM with 3D Gaussian Splatting. It initializes from a high-certainty region and incrementally expands the reconstruction by interleaving local tracking of new images with 3DGS optimization. Refined poses from this process are fed back into the multimodal pose-graph optimization to improve global consistency. The authors claim this enables COLMAP-free reconstruction of two complex reef sites and more accurate AUV pose estimation over trajectories up to 700 m.
Significance. If the closed-loop integration between 3DGS and multimodal SLAM proves robust, this work could significantly advance field-deployable underwater mapping by leveraging readily available sensor modalities to achieve high-quality reconstructions without reliance on computationally heavy offline SfM methods like COLMAP. This is particularly valuable for AUV operations in challenging marine environments where accurate large-scale models are needed for ecological monitoring or navigation.
major comments (2)
- [Abstract] Abstract: The central claim of 'more accurate global pose estimation' over 700 m trajectories and successful COLMAP-free reconstruction of two reef sites is presented without any quantitative metrics, baselines, error bars, ablation studies, or uncertainty quantification. This absence is load-bearing because the soundness of the closed-loop improvement cannot be assessed from the given description alone.
- [Abstract] Abstract: The incremental expansion process is described as feeding 3DGS-refined poses back into the pose-graph, but no explicit mechanism (e.g., uncertainty-aware weighting, selective insertion, or robust loss) is mentioned to guard against divergence from underwater photometric errors such as scattering, attenuation, or non-Lambertian surfaces. This directly bears on the weakest assumption that the loop reliably improves rather than corrupts global consistency.
minor comments (2)
- The abstract would be clearer if it briefly specified the exact multimodal sensors (acoustic, inertial, pressure, visual) and how their uncertainties are modeled in the initial pose-graph.
- A high-level diagram of the interleaving between local 3DGS tracking/optimization and global pose-graph update would improve readability of the incremental pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the abstract requires quantitative metrics to support the central claims and an explicit mention of the integration safeguards. We will revise the abstract accordingly while ensuring the manuscript body already provides the supporting details and experiments.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of 'more accurate global pose estimation' over 700 m trajectories and successful COLMAP-free reconstruction of two reef sites is presented without any quantitative metrics, baselines, error bars, ablation studies, or uncertainty quantification. This absence is load-bearing because the soundness of the closed-loop improvement cannot be assessed from the given description alone.
Authors: We agree that the abstract would benefit from including key quantitative results to substantiate the claims. The full manuscript reports these evaluations in the experiments, including pose error reductions versus baselines over the 700 m trajectories and reconstruction metrics for the two reef sites. We will revise the abstract to summarize these quantitative findings, such as the reported accuracy improvements and COLMAP-free outcomes. revision: yes
-
Referee: [Abstract] Abstract: The incremental expansion process is described as feeding 3DGS-refined poses back into the pose-graph, but no explicit mechanism (e.g., uncertainty-aware weighting, selective insertion, or robust loss) is mentioned to guard against divergence from underwater photometric errors such as scattering, attenuation, or non-Lambertian surfaces. This directly bears on the weakest assumption that the loop reliably improves rather than corrupts global consistency.
Authors: The manuscript details the feedback mechanism in the methods section, where 3DGS-refined poses are incorporated into the pose-graph optimization using uncertainty estimates from the multimodal SLAM to provide weighting that mitigates the impact of underwater photometric errors. We acknowledge the abstract does not explicitly reference this safeguard. We will revise the abstract to briefly note the uncertainty-aware integration and strengthen the methods discussion on robustness to scattering, attenuation, and non-Lambertian effects. revision: partial
Circularity Check
No significant circularity; closed-loop method relies on empirical interaction of independent components
full rationale
The paper describes an incremental framework that starts with multimodal SLAM poses from acoustic/inertial/pressure/visual data, builds an initial 3DGS model in a high-certainty region, interleaves local tracking and scene optimization, and feeds refined poses back into global pose-graph optimization. No equations, derivations, or fitted parameters are shown that reduce the claimed pose accuracy or reconstruction quality to quantities defined by the method itself. The central claim is an empirical assertion about the closed-loop behavior on real 700 m reef trajectories, not a tautological re-expression of inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify load-bearing steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Curee: A curious un- derwater robot for ecosystem exploration,
Y . Girdhar, N. McGuire, L. Cai, S. Jamieson, S. McCammon, B. Claus, J. E. S. Soucie, J. E. Todd, and T. A. Mooney, “Curee: A curious un- derwater robot for ecosystem exploration,” in2023 IEEE International Conference on Robotics and Automation, 2023, pp. 11 411–11 417
work page 2023
-
[2]
Underwater exploration and mapping,
B. Joshi, M. Xanthidis, M. Roznere, N. J. Burgdorfer, P. Mordohai, A. Q. Li, and I. Rekleitis, “Underwater exploration and mapping,” in 2022 IEEE/OES Autonomous Underwater V ehicles Symposium (AUV). IEEE, 2022, pp. 1–7
work page 2022
-
[3]
M. Johnson-Roberson, M. Bryson, A. Friedman, O. Pizarro, G. Troni, P. Ozog, and J. C. Henderson, “High-resolution underwater robotic vision-based mapping and three-dimensional reconstruction for ar- chaeology,”Journal of Field Robotics, vol. 34, no. 4, pp. 625–643, 2017
work page 2017
-
[4]
F. F. R. Merveille, B. Jia, Z. Xu, and B. Fred, “Advancements in sensor fusion for underwater slam: A review on enhanced navigation and environmental perception,”Sensors (Basel, Switzerland), vol. 24, no. 23, p. 7490, 2024
work page 2024
-
[5]
Simultaneous localisation and mapping on the great barrier reef,
S. Williams and I. Mahon, “Simultaneous localisation and mapping on the great barrier reef,” inIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, vol. 2, 2004, pp. 1771–1776 V ol.2
work page 2004
-
[6]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023
work page 2023
-
[7]
Splatam: Splat, track and map 3d gaussians for dense rgb-d slam,
N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat, track and map 3d gaussians for dense rgb-d slam,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 21 357– 21 366
work page 2024
-
[8]
Wildgs-slam: Monocular gaussian splatting slam in dynamic envi- ronments,
J. Zheng, Z. Zhu, V . Bieri, M. Pollefeys, S. Peng, and I. Armeni, “Wildgs-slam: Monocular gaussian splatting slam in dynamic envi- ronments,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 11 461–11 471
work page 2025
-
[9]
H. Matsuki, R. Murai, P. H. J. Kelly, and A. J. Davison, “Gaussian Splatting SLAM,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[10]
Dust3r: Geometric 3d vision made easy,
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “Dust3r: Geometric 3d vision made easy,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 697–20 709
work page 2024
-
[11]
Must3r: Multi-view network for stereo 3d reconstruc- tion,
Y . Cabon, L. Stoffl, L. Antsfeld, G. Csurka, B. Chidlovskii, J. Revaud, and V . Leroy, “Must3r: Multi-view network for stereo 3d reconstruc- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 1050–1060
work page 2025
-
[12]
Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion,
B. P. Duisterhof, L. Zust, P. Weinzaepfel, V . Leroy, Y . Cabon, and J. Revaud, “Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion,” in2025 International Conference on 3D Vi- sion (3DV). IEEE, 2025, pp. 1–10
work page 2025
-
[13]
Continuous 3d perception model with persistent state,
Q. Wang, Y . Zhang, A. Holynski, A. A. Efros, and A. Kanazawa, “Continuous 3d perception model with persistent state,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 10 510–10 522
work page 2025
-
[14]
3D Reconstruction with Spatial Memory
H. Wang and L. Agapito, “3d reconstruction with spatial memory,” arXiv preprint arXiv:2408.16061, 2024
work page internal anchor Pith review arXiv 2024
-
[15]
Vggt: Visual geometry grounded transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 5294–5306
work page 2025
-
[16]
A survey of underwater vehicle navigation: Recent advances and new challenges,
J. C. Kinsey, R. M. Eustice, and L. L. Whitcomb, “A survey of underwater vehicle navigation: Recent advances and new challenges,” inIF AC conference of manoeuvering and control of marine craft, vol. 88. Lisbon, 2006, pp. 1–12
work page 2006
-
[17]
Autonomous underwater vehicle naviga- tion,
J. J. Leonard and A. Bahr, “Autonomous underwater vehicle naviga- tion,”Springer handbook of ocean engineering, pp. 341–358, 2016
work page 2016
-
[18]
B. Joshi, S. Rahman, M. Kalaitzakis, B. Cain, J. Johnson, M. Xan- thidis, N. Karapetyan, A. Hernandez, A. Q. Li, N. Vitzilaioset al., “Experimental comparison of open source visual-inertial-based state estimation algorithms in the underwater domain,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 7227–7233
work page 2019
-
[19]
Svin2: An underwater slam system using sonar, visual, inertial, and depth sensor,
S. Rahman, A. Q. Li, and I. Rekleitis, “Svin2: An underwater slam system using sonar, visual, inertial, and depth sensor,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 1861–1868
work page 2019
-
[20]
Real-time dense 3d mapping of underwater environments,
W. Wang, B. Joshi, N. Burgdorfer, K. Batsosc, A. Q. Lid, P. Mordo- haia, and I. Rekleitisb, “Real-time dense 3d mapping of underwater environments,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5184–5191
work page 2023
-
[21]
Aqua-slam: Tightly-coupled un- derwater acoustic-visual-inertial slam with sensor calibration,
S. Xu, K. Zhang, and S. Wang, “Aqua-slam: Tightly-coupled un- derwater acoustic-visual-inertial slam with sensor calibration,”IEEE Transactions on Robotics, 2025
work page 2025
-
[22]
J. Song, O. Bagoren, R. Andigani, A. Sethuraman, and K. A. Skinner, “Turtlmap: Real-time localization and dense mapping of low-texture underwater environments with a low-cost unmanned underwater vehi- cle,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 1191–1198
work page 2024
-
[23]
Structure-from-motion revisited,
J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition, 2016
work page 2016
-
[24]
Agisoft Metashape: Professional Edition
“Agisoft Metashape: Professional Edition.” [Online]. Available: https://www.agisoft.com/features/professional-edition/
-
[25]
Scalable semantic 3d mapping of coral reefs with deep learning,
J. Sauder, G. Banc-Prandi, A. Meibom, and D. Tuia, “Scalable semantic 3d mapping of coral reefs with deep learning,”Methods in Ecology and Evolution, vol. 15, no. 5, pp. 916–934, 2024
work page 2024
-
[26]
Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video,
J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video,” inAdvances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[27]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” inECCV, 2020
work page 2020
-
[28]
Keeping a good attitude: A quaternion-based orientation filter for imus and margs,
R. G. Valenti, I. Dryanovski, and J. Xiao, “Keeping a good attitude: A quaternion-based orientation filter for imus and margs,”Sensors, vol. 15, no. 8, pp. 19 302–19 330, 2015
work page 2015
-
[29]
A generalized extended kalman filter imple- mentation for the robot operating system,
T. Moore and D. Stouch, “A generalized extended kalman filter imple- mentation for the robot operating system,” inIntelligent Autonomous Systems 13: Proceedings of the 13th International Conference IAS-13. Springer, 2016, pp. 335–348
work page 2016
-
[30]
gsplat: An open-source library for gaussian splatting,
V . Ye, R. Li, J. Kerr, M. Turkulainen, B. Yi, Z. Pan, O. Seiskari, J. Ye, J. Hu, M. Tanciket al., “gsplat: An open-source library for gaussian splatting,”Journal of Machine Learning Research, vol. 26, no. 34, pp. 1–17, 2025
work page 2025
-
[31]
Unsupervised monocu- lar depth estimation with left-right consistency,
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocu- lar depth estimation with left-right consistency,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279
work page 2017
-
[32]
Dinov2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y . Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without ...
work page 2023
-
[33]
WildGaussians: 3D gaussian splatting in the wild,
J. Kulhanek, S. Peng, Z. Kukelova, M. Pollefeys, and T. Sattler, “WildGaussians: 3D gaussian splatting in the wild,”Advances in Neural Information Processing Systems, vol. 38, 2024
work page 2024
-
[34]
Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild,
W. Ren, Z. Zhu, B. Sun, J. Chen, M. Pollefeys, and S. Peng, “Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8931–8940
work page 2024
-
[35]
Improving 2D Feature Representations by 3D-Aware Fine-Tuning,
Y . Yue, A. Das, F. Engelmann, S. Tang, and J. E. Lenssen, “Improving 2D Feature Representations by 3D-Aware Fine-Tuning,” inEuropean Conference on Computer Vision (ECCV), 2024
work page 2024
-
[36]
3D Gaussian Splatting as Markov Chain Monte Carlo,
S. Kheradmand, D. Rebain, G. Sharma, W. Sun, Y .-C. Tseng, H. Isack, A. Kar, A. Tagliasacchi, and K. M. Yi, “3D Gaussian Splatting as Markov Chain Monte Carlo,”Advances in Neural Information Processing Systems, vol. 38, 2024
work page 2024
-
[37]
L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”Advances in Neural Information Processing Systems, vol. 37, 2024
work page 2024
-
[38]
Dn-splatter: Depth and normal priors for gaussian splatting and meshing,
M. Turkulainen, X. Ren, I. Melekhov, O. Seiskari, E. Rahtu, and J. Kannala, “Dn-splatter: Depth and normal priors for gaussian splatting and meshing,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 2421–2431
work page 2025
-
[39]
N. Aoki, B. Weiss, Y . J ´ez´equel, A. Apprill, and T. A. Mooney, “Replayed reef sounds induce settlement of favia fragum coral larvae in aquaria and field environments,”JASA Express Letters, vol. 4, no. 10, 2024
work page 2024
-
[40]
AprilTag: A robust and flexible visual fiducial system,
E. Olson, “AprilTag: A robust and flexible visual fiducial system,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2011, pp. 3400–3407
work page 2011
-
[41]
evo: Python package for the evaluation of odometry and slam
M. Grupp, “evo: Python package for the evaluation of odometry and slam.” https://github.com/MichaelGrupp/evo, 2017
work page 2017
-
[42]
Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,
C. Campos, R. Elvira, J. J. G. Rodr ´ıguez, J. M. Montiel, and J. D. Tard ´os, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,”IEEE transactions on robotics, vol. 37, no. 6, pp. 1874–1890, 2021
work page 2021
-
[43]
DROID-SLAM: Deep Visual SLAM for Monoc- ular, Stereo, and RGB-D Cameras,
Z. Teed and J. Deng, “DROID-SLAM: Deep Visual SLAM for Monoc- ular, Stereo, and RGB-D Cameras,”Advances in neural information processing systems, 2021
work page 2021
-
[44]
MASt3R-SLAM: Real- time dense SLAM with 3D reconstruction priors,
R. Murai, E. Dexheimer, and A. J. Davison, “MASt3R-SLAM: Real- time dense SLAM with 3D reconstruction priors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2025
work page 2025
-
[45]
Vggt-slam: Dense rgb slam optimized on the sl (4) manifold,
D. Maggio, H. Lim, and L. Carlone, “Vggt-slam: Dense rgb slam optimized on the sl (4) manifold,”Advances in Neural Information Processing Systems, vol. 39, 2025
work page 2025
-
[46]
F. Dellaert and G. Contributors, “borglab/gtsam,” May 2022. [Online]. Available: https://github.com/borglab/gtsam)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.