BALTIC: A Benchmark and Cross-Domain Strategy for 3D Reconstruction Across Air and Underwater Domains Under Varying Illumination
Pith reviewed 2026-05-10 03:16 UTC · model grok-4.3
The pith
A controlled benchmark finds that 3D Gaussian Splatting with basic white-balance correction matches specialized underwater reconstruction methods when textures and lighting stay consistent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the controlled conditions of the BALTIC benchmark, which includes accurate ground-truth poses from an HTC Vive tracker and consistent scene textures, 3D Gaussian Splatting combined with straightforward preprocessing steps such as white balance correction produces trajectory accuracy, scene geometry, and rendered outputs comparable to those of specialized underwater reconstruction techniques, although performance drops when the same approach is applied to more varied real-world scenes.
What carries the argument
The BALTIC benchmark datasets together with the cross-domain augmentation strategy of mixing a small number of in-air views into underwater sequences before applying COLMAP, NeRF, or Gaussian Splatting.
If this is right
- Adding a few in-air frames to underwater sequences improves both trajectory accuracy and scene geometry when using standard reconstruction pipelines.
- Simple color correction steps such as white balance restore enough radiometric consistency for Gaussian Splatting to compete with domain-specific methods.
- Performance remains high across ambient, artificial, and mixed lighting as long as scene texture stays consistent.
- The same pipeline shows reduced robustness once the environment becomes heterogeneous, indicating the need for further adaptation in uncontrolled settings.
Where Pith is reading between the lines
- Robotic systems that move between air and water could rely on a single lightweight pipeline rather than maintaining separate air and underwater models when lighting and texture conditions are stable.
- Future benchmarks should add controlled variations in scattering and sediment to isolate which environmental factors cause the observed drop in real-world performance.
- The color-restoration analysis could be extended to test whether the same preprocessing also benefits NeRF-based methods in the same cross-domain setting.
Load-bearing premise
The custom water tank with uniform textures and precise HTC Vive ground truth captures the essential challenges of real heterogeneous underwater environments that contain varying illumination and scattering.
What would settle it
Re-running the identical Gaussian Splatting pipeline on field data from a heterogeneous underwater site and finding that its trajectory error or perceptual image quality falls substantially below that of a specialized underwater method on the same sequences.
read the original abstract
Robust 3D reconstruction across varying environmental conditions remains a critical challenge for robotic perception, particularly when transitioning between air and water. To address this, we introduce BALTIC, a controlled benchmark designed to systematically evaluate modern 3D reconstruction methods under variations in medium and lighting. The benchmark comprises 13 datasets spanning two media (air and water) and three lighting conditions (ambient, artificial, and mixed), with additional variations in motion type, scanning pattern, and initialization trajectory, resulting in a diverse set of sequences. Our experimental setup features a custom water tank equipped with a monocular camera and an HTC Vive tracker, enabling accurate ground-truth pose estimation. We further investigate cross-domain reconstruction by augmenting underwater image sequences with a small number of in-air views captured under similar lighting conditions. We evaluate Structure-from-Motion reconstruction using COLMAP in terms of both trajectory accuracy and scene geometry, and use these reconstructions as input to Neural Radiance Fields and 3D Gaussian Splatting methods. The resulting models are assessed against ground-truth trajectories and in-air references, while rendered outputs are compared using perceptual and photometric metrics. Additionally, we perform a color restoration analysis to evaluate radiometric consistency across domains. Our results show that under controlled, texture-consistent conditions, Gaussian Splatting with simple preprocessing (e.g., white balance correction) can achieve performance comparable to specialized underwater methods, although its robustness decreases in more complex and heterogeneous real-world environments
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BALTIC, a controlled benchmark with 13 datasets spanning air and underwater media under ambient, artificial, and mixed lighting, plus variations in motion and scanning. Using a custom water tank with monocular camera and HTC Vive tracker for ground-truth poses, the authors run COLMAP for SfM, then feed the results into NeRF and 3D Gaussian Splatting. They also test cross-domain augmentation by adding a small number of in-air views and evaluate trajectory accuracy, geometry, rendered outputs via perceptual/photometric metrics, and color restoration. The central result is that, under the controlled texture-consistent tank conditions, Gaussian Splatting with simple preprocessing (e.g., white-balance correction) reaches performance comparable to specialized underwater methods, while noting reduced robustness outside this regime.
Significance. If the quantitative claims hold, the work supplies a useful, reproducible benchmark that isolates medium and illumination effects while holding scene texture fixed, which is valuable for robotic perception research. The finding that standard GS plus minimal preprocessing can match specialized methods inside the controlled setting offers a practical baseline and highlights the value of the cross-domain augmentation strategy. The release of the 13 datasets and the explicit scoping to controlled conditions are strengths that support future comparative studies.
major comments (2)
- Abstract and §4 (Experimental Results): the manuscript states that GS with white-balance correction achieves performance comparable to specialized underwater methods, yet provides no full quantitative tables, error bars, ablation studies, or statistical tests comparing all methods across the 13 sequences. Without these, it is impossible to assess whether the comparability claim is robust or influenced by unstated selection criteria.
- §3.2 (Benchmark Construction): the custom water tank is presented as sufficiently representative for isolating medium and lighting variations, but the text does not quantify how the fixed textures and controlled scattering compare to real heterogeneous underwater environments; this assumption is load-bearing for the claim that results generalize beyond the tank.
minor comments (2)
- §2 (Related Work): several citations to recent underwater NeRF/GS papers are missing; adding them would better situate the cross-domain augmentation strategy.
- Figure captions and Table 1: ensure every lighting/medium combination is explicitly labeled so readers can map the 13 datasets to the reported metrics without ambiguity.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment in detail below.
read point-by-point responses
-
Referee: Abstract and §4 (Experimental Results): the manuscript states that GS with white-balance correction achieves performance comparable to specialized underwater methods, yet provides no full quantitative tables, error bars, ablation studies, or statistical tests comparing all methods across the 13 sequences. Without these, it is impossible to assess whether the comparability claim is robust or influenced by unstated selection criteria.
Authors: We agree that a more comprehensive presentation of the quantitative results is necessary to substantiate the comparability claim. In the revised manuscript, we will expand §4 to include full tables reporting all evaluation metrics for NeRF, 3D Gaussian Splatting (with and without preprocessing), and specialized underwater methods across each of the 13 sequences. We will incorporate error bars derived from repeated experiments where feasible, detailed ablation studies on the impact of white-balance correction and other preprocessing steps, and statistical tests (such as Wilcoxon signed-rank tests) to compare performance. These additions will ensure transparency and allow readers to verify the robustness of our findings independently of any selection. revision: yes
-
Referee: §3.2 (Benchmark Construction): the custom water tank is presented as sufficiently representative for isolating medium and lighting variations, but the text does not quantify how the fixed textures and controlled scattering compare to real heterogeneous underwater environments; this assumption is load-bearing for the claim that results generalize beyond the tank.
Authors: The benchmark is explicitly scoped to controlled conditions with fixed textures to isolate the effects of the medium and illumination, as emphasized in the abstract and throughout the paper. We do not claim that the results generalize directly to heterogeneous real-world underwater scenes; in fact, the manuscript already states that 'its robustness decreases in more complex and heterogeneous real-world environments.' To address the referee's concern, we will revise §3.2 to include a quantitative discussion of the differences, drawing on established values from underwater optics literature for scattering and attenuation coefficients, and describe how our tank's controlled scattering (low turbidity) differs from typical ocean or lake environments with variable particulates and textures. This will better delineate the scope without altering the core contribution. revision: partial
Circularity Check
Empirical benchmark with no derivations or self-referential reductions
full rationale
The paper introduces the BALTIC benchmark dataset and performs an empirical evaluation of standard 3D reconstruction pipelines (COLMAP SfM, NeRF, Gaussian Splatting) on air/water sequences under controlled lighting. All reported results consist of direct comparisons against ground-truth trajectories, in-air references, and perceptual metrics after simple preprocessing steps such as white-balance correction. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claim is explicitly scoped to the custom tank's texture-consistent regime and does not extrapolate via any internal reduction to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
URL https://arxiv.org/abs/2103.13415
Barron JT, Mildenhall B, Tancik M, Hedman P , Martin- Brualla R and Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. URL https://arxiv.org/abs/2103.13415. Grimaldi M, Nakath D, She M and K ¨oser K (2023) Investigation of the challenges of underwater-visual-monocular-slam. ISPRS Annals of the Photogrammetr...
-
[2]
Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching
Springer. Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching. Sensors 25(22):
work page 2025
-
[3]
DOI:10.3390/s25226966. Tancik M, Weber E, Ng E, Li R, Yi B, Wang T, Kristoffersen A, Austin J, Salahi K, Ahuja A, Mcallister D, Kerr J and Kanazawa A (2023) Nerfstudio: A modular framework for neural radiance field development. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings , SIGGRAPH ’23. ACM, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.