CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture
Pith reviewed 2026-05-10 11:06 UTC · model grok-4.3
The pith
CAVERS supplies 24 multimodal sequences from a natural karst cave with millimeter-accurate motion-capture ground truth.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present CAVERS, a multimodal dataset acquired in two structurally distinct rooms of Cueva de la Victoria, comprising 24 sequences totaling approximately 335 GB of recorded data. The sensor suite combines an Intel RealSense D435i RGB-D-I camera, an Optris PI640i near-IR thermal camera, and a Velodyne VLP-16 LiDAR, operated both handheld and mounted on a wheeled rover under full darkness and artificial illumination. For most of the sequences, mm-accurate 6-DoF ground truth pose and velocity at 120 Hz are provided by an Optirack motion capture system installed directly inside the cave. We benchmark seven state-of-the-art SLAM and odometry algorithms spanning visual, visual-inertial, thermal-
What carries the argument
CAVERS multimodal dataset of synchronized RGB-D, thermal, and LiDAR streams paired with OptiTrack 6-DoF ground-truth poses
If this is right
- SLAM algorithms can be evaluated on cave-specific challenges such as reflective wet surfaces and zero ambient light.
- Multimodal fusion methods combining visual, thermal, and LiDAR data can be tested with accurate ground truth.
- Odometry accuracy in irregular branching passages can be measured directly.
- 3D reconstruction pipelines can be assessed on natural karst geometry.
- Data from both handheld and rover-mounted capture is available for different motion profiles.
Where Pith is reading between the lines
- The dataset may accelerate development of navigation software specialized for search-and-rescue or scientific exploration inside caves.
- Comparable capture campaigns in other cave systems could build a broader corpus of natural environments.
- The ground-truth poses allow quantitative study of how each sensor modality improves robustness in darkness.
- Future releases could add longer connected traversals through multiple rooms and passages.
Load-bearing premise
The OptiTrack motion-capture system inside the cave delivers millimeter-accurate 6-DoF poses at 120 Hz without significant occlusion or calibration drift under the cave's lighting and surface conditions.
What would settle it
Independent alignment of overlapping point clouds from separate sequences revealing that the released OptiTrack trajectories deviate by more than a few millimeters from the actual motion.
Figures
read the original abstract
Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this environment remain scarce and offer limited sensing modalities and environmental diversity. We present CAVERS, a multimodal dataset acquired in two structurally distinct rooms of Cueva de la Victoria, M\'alaga, Spain, comprising 24 sequences totaling approximately 335 GB of recorded data. The sensor suite combines an Intel RealSense D435i RGB-D-I camera, an Optris PI640i near-IR thermal camera, and a Velodyne VLP-16 LiDAR, operated both handheld and mounted on a wheeled rover under full darkness and artificial illumination. For most of the sequences, mm-accurate 6-DoF ground truth pose and velocity at 120 Hz are provided by an Optirack motion capture system installed directly inside the cave. We benchmark seven state-of-the-art SLAM and odometry algorithms spanning visual, visual-inertial, thermal-inertial, and LiDAR-based pipelines, as well as a 3D reconstruction pipeline, demonstrating the dataset's usability. %The dataset and all supplementary material are publicly available at: https://github.com/spaceuma/cavers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents CAVERS, a multimodal dataset of 24 sequences (~335 GB) collected in two rooms of a natural karstic cave using an Intel RealSense D435i (RGB-D-I), Optris PI640i thermal camera, and Velodyne VLP-16 LiDAR, operated both handheld and on a rover under darkness and artificial light. For most sequences it supplies 6-DoF ground truth at 120 Hz from an in-cave OptiTrack system and benchmarks seven SLAM/odometry algorithms (visual, visual-inertial, thermal-inertial, LiDAR) plus a 3D reconstruction pipeline to show usability.
Significance. If the ground-truth accuracy claim is substantiated, the dataset would fill a genuine gap: publicly available multimodal cave data with irregular geometry, wet reflective surfaces, and near-zero ambient light remain scarce. The combination of thermal, RGB-D, and LiDAR modalities plus rover/handheld modes and public release would enable more realistic SLAM evaluation than existing mine/tunnel datasets. The benchmarking exercise, while preliminary, provides an initial demonstration of utility.
major comments (2)
- [Abstract / Ground Truth section] Abstract and Ground-Truth description: the repeated claim of 'mm-accurate 6-DoF ground truth at 120 Hz' from the OptiTrack system is load-bearing for the dataset's value as a SLAM benchmark, yet no cave-specific error metrics (static repeatability, drift over sequence length, occlusion statistics, or cross-validation against another sensor) are supplied. Manufacturer specifications alone do not establish performance under the reported conditions of reflective wet surfaces and potential IR interference from the thermal camera.
- [Benchmarking / Results section] Benchmarking section: the statement that seven algorithms were evaluated 'demonstrating the dataset's usability' is not accompanied by any quantitative results (absolute trajectory error, relative pose error, failure rates, or per-modality comparisons). Without these numbers or failure-case statistics the central claim that the data are usable for SLAM research cannot be assessed.
minor comments (3)
- [Abstract] Abstract contains the typo 'Optirack' (should be 'OptiTrack').
- [Abstract] The GitHub link to the dataset is commented out in the abstract; it should be restored and verified.
- [Methods / Sensor Suite] Sensor mounting details, exact calibration procedures between the three modalities, and synchronization timestamps are not described at a level that would allow independent reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the importance of substantiating the ground-truth accuracy and providing quantitative benchmarking results. We address each major comment below and will revise the manuscript to strengthen these aspects.
read point-by-point responses
-
Referee: [Abstract / Ground Truth section] Abstract and Ground-Truth description: the repeated claim of 'mm-accurate 6-DoF ground truth at 120 Hz' from the OptiTrack system is load-bearing for the dataset's value as a SLAM benchmark, yet no cave-specific error metrics (static repeatability, drift over sequence length, occlusion statistics, or cross-validation against another sensor) are supplied. Manufacturer specifications alone do not establish performance under the reported conditions of reflective wet surfaces and potential IR interference from the thermal camera.
Authors: We agree that cave-specific validation metrics are necessary to support the mm-accurate ground-truth claim under the reported conditions. The OptiTrack system was calibrated inside the cave, and the sensor suite was arranged to reduce marker occlusion and IR overlap with the thermal camera. However, the manuscript does not report quantitative error analysis. In the revised version we will add a dedicated paragraph to the Ground-Truth section containing: (i) static repeatability measurements performed in the cave, (ii) observed drift statistics over the recorded sequence lengths, (iii) occlusion rates encountered during data collection, and (iv) a brief discussion of potential thermal-camera IR interference together with the mitigation steps taken. These additions will directly address the referee’s concern. revision: yes
-
Referee: [Benchmarking / Results section] Benchmarking section: the statement that seven algorithms were evaluated 'demonstrating the dataset's usability' is not accompanied by any quantitative results (absolute trajectory error, relative pose error, failure rates, or per-modality comparisons). Without these numbers or failure-case statistics the central claim that the data are usable for SLAM research cannot be assessed.
Authors: We acknowledge that the current text presents the benchmarking exercise largely qualitatively. Although the seven algorithms were run on the sequences, the manuscript does not tabulate the resulting error metrics. In the revised manuscript we will expand the Benchmarking section with a table reporting Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) for each algorithm and modality, together with failure rates and notes on the most challenging sequences (e.g., low-light or highly reflective surfaces). This will supply the concrete quantitative evidence needed to evaluate the dataset’s utility. revision: yes
Circularity Check
No circularity: empirical dataset paper with no derivation chain
full rationale
The paper presents a multimodal cave dataset and benchmarks existing SLAM algorithms on it. No mathematical derivations, parameter fittings, predictions, or self-referential models are claimed. Ground-truth pose is supplied by an external commercial motion-capture system; the paper states its use without deriving or fitting the accuracy figure internally. Benchmarking results are direct empirical outputs on the released data, not reductions to fitted inputs. The contribution is self-contained data collection and public release, with no load-bearing steps that collapse by construction to the paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The OptiTrack motion-capture system installed directly inside the cave supplies mm-accurate 6-DoF pose and velocity at 120 Hz for most sequences.
Reference graph
Works this paper leans on
-
[1]
D. C. Culver and T. Pipan,The Biology of Caves and Other Subter- ranean Habitats. Oxford University Press, 04 2019
work page 2019
-
[2]
Into the unknown: Microbial communities in caves, their role, and potential use,
K. Kosznik-Kwa ´snicka, P. Golec, W. Jaroszewicz, D. Lubomska, and L. Piechowicz, “Into the unknown: Microbial communities in caves, their role, and potential use,”Microorganisms, vol. 10, no. 2, p. 222, 2022
work page 2022
-
[3]
D. Ford and P. Williams,Karst Water Resources Management. John Wiley & Sons, Ltd, 2007, ch. 11, pp. 441–469
work page 2007
-
[4]
Autonomous cave surveying with an aerial robot,
W. Tabib, K. Goel, J. Yao, C. Boirum, and N. Michael, “Autonomous cave surveying with an aerial robot,”IEEE Transactions on Robotics, vol. 38, no. 2, pp. 1016–1032, 2022
work page 2022
-
[6]
Field-hardened robotic autonomy for subterranean exploration,
T. Dang, F. Mascarich, S. Khattak, H. Nguyen, N. Khedekar, C. Pa- pachristos, and K. Alexis, “Field-hardened robotic autonomy for subterranean exploration,” inField and Service Robotics, 08 2019
work page 2019
-
[7]
The DARPA subterranean challenge: A synopsis of the circuits stage,
V . L. Orekhov and T. H. Chung, “The DARPA subterranean challenge: A synopsis of the circuits stage,”Field Robotics, vol. 2, pp. 735–747, 2022
work page 2022
-
[8]
CERBERUS in the DARPA Subterranean Challenge,
M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart, and K. Alexis, “CERBERUS in the DARPA Subterranean Challenge,” Science Robotics, vol. 7, no. 66, p. eabp9742, 2022
work page 2022
-
[9]
Present and future of SLAM in extreme environments: The DARPA SubT challenge,
K. Ebadi, L. Bernreiter, H. Biggie, G. Catt, Y . Chang, A. Chatterjee, C. E. Denniston, S.-P. Deschˆenes, K. Harlow, S. Khattak, L. Nogueira, M. Palieri, P. Petr´aˇcek, M. Petrl´ık, A. Reinke, V . Kr´atk´y, S. Zhao, A.-a. Agha-mohammadi, K. Alexis, C. Heckman, K. Khosoussi, N. Kottege, B. Morrell, M. Hutter, F. Pauling, F. Pomerleau, M. Saska, S. Scherer,...
work page 2024
-
[10]
Vision-LiDAR-inertial localization and mapping dataset of a mining cave,
Y . Zhou, S. Zhu, and Y . Li, “Vision-LiDAR-inertial localization and mapping dataset of a mining cave,” inIntelligent Robotics and Applications, H. Yang, H. Liu, J. Zou, Z. Yin, L. Liu, G. Yang, X. Ouyang, and Z. Wang, Eds. Singapore: Springer Nature Singapore, 2023, pp. 411–422
work page 2023
-
[11]
LOCUS 2.0: Robust and computationally efficient lidar odometry for real-time 3D mapping,
A. Reinke, M. Palieri, B. Morrell, Y . Chang, K. Ebadi, L. Carlone, and A.-A. Agha-Mohammadi, “LOCUS 2.0: Robust and computationally efficient lidar odometry for real-time 3D mapping,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9043–9050, 2022
work page 2022
-
[12]
Chilean underground mine dataset,
K. Leung, D. L ¨uhr, H. Houshiar, F. Inostroza, D. Borrmann, M. Adams, A. N ¨uchter, and J. R. del Solar, “Chilean underground mine dataset,”The International Journal of Robotics Research, vol. 36, no. 1, pp. 16–23, 2017
work page 2017
-
[13]
MIN3D dataset: Multi-sensor 3D mapping with an unmanned ground vehicle,
P. Trybała, J. Szrek, F. Remondino, P. Kujawa, J. Wodecki, J. Bla- chowski, and R. Zimroz, “MIN3D dataset: Multi-sensor 3D mapping with an unmanned ground vehicle,”PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 91, no. 6, pp. 425– 442, 12 2023
work page 2023
-
[14]
Dataset collection from a SubT environment,
A. Koval, S. Karlsson, S. S. Mansouri, C. Kanellakis, I. Tevetzidis, J. Haluska, A. akbar Agha-mohammadi, and G. Nikolakopoulos, “Dataset collection from a SubT environment,”Robotics and Au- tonomous Systems, vol. 155, p. 104168, 2022
work page 2022
-
[15]
Test your SLAM! the SubT-tunnel dataset and metric for mapping,
J. G. Rogers, J. M. Gregory, J. Fink, and E. Stump, “Test your SLAM! the SubT-tunnel dataset and metric for mapping,” in2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 955–961
work page 2020
-
[16]
CERBERUS in the DARPA subterranean challenge,
M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart, and K. Alexis, “CERBERUS in the DARPA subterranean challenge,” Science Robotics, vol. 7, no. 66, p. eabp9742, 2022
work page 2022
-
[17]
SubT-MRS dataset: Pushing SLAM towards all- weather environments,
S. Zhao, Y . Gao, T. Wu, D. Singh, R. Jiang, H. Sun, M. Sarawata, Y . Qiu, W. Whittaker, I. Higgins, Y . Du, S. Su, C. Xu, J. Keller, J. Karhade, L. Nogueira, S. Saha, J. Zhang, W. Wang, C. Wang, and S. Scherer, “SubT-MRS dataset: Pushing SLAM towards all- weather environments,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...
work page 2024
-
[18]
A. Kyuroson, N. Dahlquist, N. Stathoulopoulos, V . K. Viswanathan, A. Koval, and G. Nikolakopoulos, “Multimodal dataset from harsh sub-terranean environment with aerosol particles for frontier explo- ration,” in2023 31st Mediterranean Conference on Control and Automation (MED), 2023, pp. 716–721
work page 2023
-
[19]
A benchmark for visual- inertial odometry systems employing onboard illumination,
M. Kasper, S. McGuire, and C. Heckman, “A benchmark for visual- inertial odometry systems employing onboard illumination,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5256–5263
work page 2019
-
[20]
Fundamental science and engineering questions in plane- tary cave exploration,
J. J. Wynne, T. N. Titus, A.-a. Agha-Mohammadi, A. Azua-Bustos, P. J. Boston, P. de Le ´on, C. Demirel-Floyd, J. De Waele, H. Jones, M. J. Malaska, A. Z. Miller, H. M. Sapers, F. Sauro, D. L. Sonderegger, K. Uckert, U. Y . Wong, E. C. Alexander Jr., L. Chiao, G. E. Cushing, J. DeDecker, A. G. Fair ´en, A. Frumkin, G. L. Harris, M. L. Kearney, L. Kerber, R...
work page 2022
-
[21]
Robotic lava tube mapping and multimodal data collection using quadruped and LiDAR,
A. J. Hidding, L. Peternel, A. J. Becoy, F. Romio, and G. C. Calabrese, “Robotic lava tube mapping and multimodal data collection using quadruped and LiDAR,” May 2025
work page 2025
-
[22]
NASA planetary pits and caves analog dataset,
U. Wong, W. Whittaker, H. Jones, and R. Whittaker, “NASA planetary pits and caves analog dataset,” Dec. 2014
work page 2014
-
[23]
Large-scale exploration of cave environments by unmanned aerial vehicles,
P. Petr ´aˇcek, V . Kr ´atk´y, M. Petrl ´ık, T. B ´aˇca, R. Kratochv ´ıl, and M. Saska, “Large-scale exploration of cave environments by unmanned aerial vehicles,”IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7596–7603, 2021
work page 2021
-
[24]
BASEPROD: The Bardenas semi-desert planetary rover dataset,
L. Gerdes, T. Wiese, R. C. Arquillo, L. Bielenberg, M. Azkarate, H. Leblond, F. Wilting, J. O. Cort ´es, A. Bernal, S. Palanco, and C. P. del Pulgar, “BASEPROD: The Bardenas semi-desert planetary rover dataset,”Scientific Data, vol. 11, p. 1054, 9 2024
work page 2024
-
[25]
A flexible new technique for camera calibration,
Z. Zhang, “A flexible new technique for camera calibration,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000
work page 2000
-
[26]
A four-step camera calibration procedure with implicit image correction,
J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit image correction,” inProceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997, pp. 1106–1112
work page 1997
-
[27]
ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,
C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,”IEEE Transactions on Robotics, vol. 37, no. 6, p. 1874–1890, Dec. 2021
work page 2021
-
[28]
M. Labb ´e and F. Michaud, “RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,”Journal of Field Robotics, vol. 36, no. 2, p. 416–446, Oct. 2018
work page 2018
-
[29]
ROVTIO: Robust visual thermal inertial odometry,
H. D. Flemmen, “ROVTIO: Robust visual thermal inertial odometry,” Master’s thesis, Norwegian University of Science and Technology, 2021
work page 2021
-
[30]
I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “KISS-ICP: In defense of point-to-point ICP – simple, accurate, and robust registration if done the right way,”IEEE Robotics and Automation Letters, vol. 8, no. 2, p. 1029–1036, Feb. 2023
work page 2023
-
[31]
GenZ-ICP: Generalizable and degeneracy-robust LiDAR odometry using an adaptive weighting,
D. Lee, H. Lim, and S. Han, “GenZ-ICP: Generalizable and degeneracy-robust LiDAR odometry using an adaptive weighting,” IEEE Robotics and Automation Letters, vol. 10, no. 1, p. 152–159, Jan. 2025
work page 2025
-
[32]
A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry,
Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry,” in2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7244–7251
work page 2018
-
[33]
High-fidelity 3D reconstruction for planetary ex- ploration,
A. Mart ´ınez-Petersen, L. Gerdes, D. Rodr ´ıguez-Mart´ınez, and C. J. P´erez-del Pulgar, “High-fidelity 3D reconstruction for planetary ex- ploration,” inProceedings of the 2026 IEEE Conference on Artificial Intelligence (CAI), 2026, to appear
work page 2026
-
[34]
C. Xu, J. Kerr, and A. Kanazawa, “Splatfacto-W: A nerfstudio imple- mentation of gaussian splatting for unconstrained photo collections,” 2024
work page 2024
-
[35]
Structure-from-motion revisited,
J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.