BALTIC: A Benchmark and Cross-Domain Strategy for 3D Reconstruction Across Air and Underwater Domains Under Varying Illumination

David Nakath; Ignacio Carlucho; Jonatan Scharff Willners; Michele Grimaldi; Oscar Pizarro; Yvan R. Petillot

arxiv: 2604.19133 · v2 · submitted 2026-04-21 · 💻 cs.CV

BALTIC: A Benchmark and Cross-Domain Strategy for 3D Reconstruction Across Air and Underwater Domains Under Varying Illumination

Michele Grimaldi , David Nakath , Oscar Pizarro , Jonatan Scharff Willners , Ignacio Carlucho , Yvan R. Petillot This is my paper

Pith reviewed 2026-05-10 03:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionunderwater imagingGaussian Splattingbenchmark datasetcross-domain transferair-water transitionillumination variationStructure-from-Motion

0 comments

The pith

A controlled benchmark finds that 3D Gaussian Splatting with basic white-balance correction matches specialized underwater reconstruction methods when textures and lighting stay consistent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BALTIC, a set of 13 sequences captured in a custom tank across air and water under ambient, artificial, and mixed lighting, with motion and initialization variations. It runs Structure-from-Motion through COLMAP, then feeds the results into Neural Radiance Fields and 3D Gaussian Splatting, while testing the effect of adding a few in-air frames to underwater sequences. The central result is that, in these stable, texture-consistent settings, Gaussian Splatting after simple preprocessing reaches accuracy and visual quality on par with methods built specifically for underwater conditions. This matters because it shows cross-domain 3D reconstruction does not always require complex domain-specific machinery when the environment is controlled.

Core claim

Under the controlled conditions of the BALTIC benchmark, which includes accurate ground-truth poses from an HTC Vive tracker and consistent scene textures, 3D Gaussian Splatting combined with straightforward preprocessing steps such as white balance correction produces trajectory accuracy, scene geometry, and rendered outputs comparable to those of specialized underwater reconstruction techniques, although performance drops when the same approach is applied to more varied real-world scenes.

What carries the argument

The BALTIC benchmark datasets together with the cross-domain augmentation strategy of mixing a small number of in-air views into underwater sequences before applying COLMAP, NeRF, or Gaussian Splatting.

If this is right

Adding a few in-air frames to underwater sequences improves both trajectory accuracy and scene geometry when using standard reconstruction pipelines.
Simple color correction steps such as white balance restore enough radiometric consistency for Gaussian Splatting to compete with domain-specific methods.
Performance remains high across ambient, artificial, and mixed lighting as long as scene texture stays consistent.
The same pipeline shows reduced robustness once the environment becomes heterogeneous, indicating the need for further adaptation in uncontrolled settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robotic systems that move between air and water could rely on a single lightweight pipeline rather than maintaining separate air and underwater models when lighting and texture conditions are stable.
Future benchmarks should add controlled variations in scattering and sediment to isolate which environmental factors cause the observed drop in real-world performance.
The color-restoration analysis could be extended to test whether the same preprocessing also benefits NeRF-based methods in the same cross-domain setting.

Load-bearing premise

The custom water tank with uniform textures and precise HTC Vive ground truth captures the essential challenges of real heterogeneous underwater environments that contain varying illumination and scattering.

What would settle it

Re-running the identical Gaussian Splatting pipeline on field data from a heterogeneous underwater site and finding that its trajectory error or perceptual image quality falls substantially below that of a specialized underwater method on the same sequences.

read the original abstract

Robust 3D reconstruction across varying environmental conditions remains a critical challenge for robotic perception, particularly when transitioning between air and water. To address this, we introduce BALTIC, a controlled benchmark designed to systematically evaluate modern 3D reconstruction methods under variations in medium and lighting. The benchmark comprises 13 datasets spanning two media (air and water) and three lighting conditions (ambient, artificial, and mixed), with additional variations in motion type, scanning pattern, and initialization trajectory, resulting in a diverse set of sequences. Our experimental setup features a custom water tank equipped with a monocular camera and an HTC Vive tracker, enabling accurate ground-truth pose estimation. We further investigate cross-domain reconstruction by augmenting underwater image sequences with a small number of in-air views captured under similar lighting conditions. We evaluate Structure-from-Motion reconstruction using COLMAP in terms of both trajectory accuracy and scene geometry, and use these reconstructions as input to Neural Radiance Fields and 3D Gaussian Splatting methods. The resulting models are assessed against ground-truth trajectories and in-air references, while rendered outputs are compared using perceptual and photometric metrics. Additionally, we perform a color restoration analysis to evaluate radiometric consistency across domains. Our results show that under controlled, texture-consistent conditions, Gaussian Splatting with simple preprocessing (e.g., white balance correction) can achieve performance comparable to specialized underwater methods, although its robustness decreases in more complex and heterogeneous real-world environments

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BALTIC gives a controlled tank benchmark for air-water 3D recon where basic Gaussian Splatting plus white balance matches specialized methods, but the setup keeps textures fixed so generalization stays limited.

read the letter

BALTIC is a new benchmark for 3D reconstruction that tests methods when switching between air and water under different lighting. The headline result is that in their controlled tank with consistent textures, Gaussian Splatting plus simple white balance correction performs about as well as specialized underwater approaches, though it loses ground in messier settings. The paper does a few things well. It collects 13 datasets that cover air and water, three lighting setups, and variations in how the camera moves. The custom tank with HTC Vive tracking gives reliable ground truth poses, which is better than many underwater datasets. The cross-domain augmentation, where they add a handful of in-air images to underwater sequences, is a straightforward idea that could help initialization. They evaluate the full pipeline from COLMAP structure-from-motion through to NeRF and 3D Gaussian Splatting, looking at both pose accuracy and rendered image quality. The color restoration part adds a useful check on radiometric consistency across domains. These elements together create a more complete picture than most prior work on underwater 3D recon. The main soft spot is the controlled nature of the setup. By keeping textures the same, the benchmark isolates the effects of the medium and lighting, but real underwater environments have changing surfaces, more variable scattering, and less predictable illumination. The authors acknowledge that performance drops outside their tank, but the paper would be stronger with some tests on less uniform data to show the gap. I also wonder about the scale of the results; the abstract talks about comparability but does not give the actual numbers or ablations, so the full version needs to include those to let readers judge the effect sizes. This paper is for people working on robotic perception in mixed air-water domains or on domain-robust 3D reconstruction techniques. A reader who needs a standardized test set for comparing COLMAP-NeRF-3DGS pipelines would get direct value from it. It deserves a serious referee because the benchmark is fresh and the evaluation protocol is clear, even if the claims stay scoped to controlled conditions. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces BALTIC, a controlled benchmark with 13 datasets spanning air and underwater media under ambient, artificial, and mixed lighting, plus variations in motion and scanning. Using a custom water tank with monocular camera and HTC Vive tracker for ground-truth poses, the authors run COLMAP for SfM, then feed the results into NeRF and 3D Gaussian Splatting. They also test cross-domain augmentation by adding a small number of in-air views and evaluate trajectory accuracy, geometry, rendered outputs via perceptual/photometric metrics, and color restoration. The central result is that, under the controlled texture-consistent tank conditions, Gaussian Splatting with simple preprocessing (e.g., white-balance correction) reaches performance comparable to specialized underwater methods, while noting reduced robustness outside this regime.

Significance. If the quantitative claims hold, the work supplies a useful, reproducible benchmark that isolates medium and illumination effects while holding scene texture fixed, which is valuable for robotic perception research. The finding that standard GS plus minimal preprocessing can match specialized methods inside the controlled setting offers a practical baseline and highlights the value of the cross-domain augmentation strategy. The release of the 13 datasets and the explicit scoping to controlled conditions are strengths that support future comparative studies.

major comments (2)

Abstract and §4 (Experimental Results): the manuscript states that GS with white-balance correction achieves performance comparable to specialized underwater methods, yet provides no full quantitative tables, error bars, ablation studies, or statistical tests comparing all methods across the 13 sequences. Without these, it is impossible to assess whether the comparability claim is robust or influenced by unstated selection criteria.
§3.2 (Benchmark Construction): the custom water tank is presented as sufficiently representative for isolating medium and lighting variations, but the text does not quantify how the fixed textures and controlled scattering compare to real heterogeneous underwater environments; this assumption is load-bearing for the claim that results generalize beyond the tank.

minor comments (2)

§2 (Related Work): several citations to recent underwater NeRF/GS papers are missing; adding them would better situate the cross-domain augmentation strategy.
Figure captions and Table 1: ensure every lighting/medium combination is explicitly labeled so readers can map the 13 datasets to the reported metrics without ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment in detail below.

read point-by-point responses

Referee: Abstract and §4 (Experimental Results): the manuscript states that GS with white-balance correction achieves performance comparable to specialized underwater methods, yet provides no full quantitative tables, error bars, ablation studies, or statistical tests comparing all methods across the 13 sequences. Without these, it is impossible to assess whether the comparability claim is robust or influenced by unstated selection criteria.

Authors: We agree that a more comprehensive presentation of the quantitative results is necessary to substantiate the comparability claim. In the revised manuscript, we will expand §4 to include full tables reporting all evaluation metrics for NeRF, 3D Gaussian Splatting (with and without preprocessing), and specialized underwater methods across each of the 13 sequences. We will incorporate error bars derived from repeated experiments where feasible, detailed ablation studies on the impact of white-balance correction and other preprocessing steps, and statistical tests (such as Wilcoxon signed-rank tests) to compare performance. These additions will ensure transparency and allow readers to verify the robustness of our findings independently of any selection. revision: yes
Referee: §3.2 (Benchmark Construction): the custom water tank is presented as sufficiently representative for isolating medium and lighting variations, but the text does not quantify how the fixed textures and controlled scattering compare to real heterogeneous underwater environments; this assumption is load-bearing for the claim that results generalize beyond the tank.

Authors: The benchmark is explicitly scoped to controlled conditions with fixed textures to isolate the effects of the medium and illumination, as emphasized in the abstract and throughout the paper. We do not claim that the results generalize directly to heterogeneous real-world underwater scenes; in fact, the manuscript already states that 'its robustness decreases in more complex and heterogeneous real-world environments.' To address the referee's concern, we will revise §3.2 to include a quantitative discussion of the differences, drawing on established values from underwater optics literature for scattering and attenuation coefficients, and describe how our tank's controlled scattering (low turbidity) differs from typical ocean or lake environments with variable particulates and textures. This will better delineate the scope without altering the core contribution. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark with no derivations or self-referential reductions

full rationale

The paper introduces the BALTIC benchmark dataset and performs an empirical evaluation of standard 3D reconstruction pipelines (COLMAP SfM, NeRF, Gaussian Splatting) on air/water sequences under controlled lighting. All reported results consist of direct comparisons against ground-truth trajectories, in-air references, and perceptual metrics after simple preprocessing steps such as white-balance correction. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claim is explicitly scoped to the custom tank's texture-consistent regime and does not extrapolate via any internal reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark and evaluation paper. No mathematical models, free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5590 in / 1108 out tokens · 46369 ms · 2026-05-10T03:16:56.305061+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

URL https://arxiv.org/abs/2103.13415

Barron JT, Mildenhall B, Tancik M, Hedman P , Martin- Brualla R and Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance ﬁelds. URL https://arxiv.org/abs/2103.13415. Grimaldi M, Nakath D, She M and K ¨oser K (2023) Investigation of the challenges of underwater-visual-monocular-slam. ISPRS Annals of the Photogrammetr...

work page doi:10.1109/tpami.2023.3307567 2021
[2]

Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching

Springer. Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching. Sensors 25(22):

work page 2025
[3]

DOI:10.3390/s25226966. Tancik M, Weber E, Ng E, Li R, Yi B, Wang T, Kristoffersen A, Austin J, Salahi K, Ahuja A, Mcallister D, Kerr J and Kanazawa A (2023) Nerfstudio: A modular framework for neural radiance ﬁeld development. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings , SIGGRAPH ’23. ACM, ...

work page doi:10.3390/s25226966 2023

[1] [1]

URL https://arxiv.org/abs/2103.13415

Barron JT, Mildenhall B, Tancik M, Hedman P , Martin- Brualla R and Srinivasan PP (2021) Mip-nerf: A multiscale representation for anti-aliasing neural radiance ﬁelds. URL https://arxiv.org/abs/2103.13415. Grimaldi M, Nakath D, She M and K ¨oser K (2023) Investigation of the challenges of underwater-visual-monocular-slam. ISPRS Annals of the Photogrammetr...

work page doi:10.1109/tpami.2023.3307567 2021

[2] [2]

Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching

Springer. Summers JM, Jones MW and Seale C (2025) Impact of underwater image enhancement on feature matching. Sensors 25(22):

work page 2025

[3] [3]

DOI:10.3390/s25226966. Tancik M, Weber E, Ng E, Li R, Yi B, Wang T, Kristoffersen A, Austin J, Salahi K, Ahuja A, Mcallister D, Kerr J and Kanazawa A (2023) Nerfstudio: A modular framework for neural radiance ﬁeld development. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings , SIGGRAPH ’23. ACM, ...

work page doi:10.3390/s25226966 2023