pith. sign in

arxiv: 2604.15052 · v1 · submitted 2026-04-16 · 💻 cs.RO

CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

Pith reviewed 2026-05-10 11:06 UTC · model grok-4.3

classification 💻 cs.RO
keywords multimodal datasetSLAMkarst caveground truthmotion captureRGB-DLiDARthermal imaging
0
0 comments X

The pith

CAVERS supplies 24 multimodal sequences from a natural karst cave with millimeter-accurate motion-capture ground truth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robots in natural karst caves must handle irregular geometry, wet reflective surfaces, and near-zero light that defeat standard SLAM methods. Public datasets from mines or tunnels do not capture these conditions. The authors collected CAVERS, a 335 GB multimodal dataset in two distinct rooms of Cueva de la Victoria, Spain. It contains 24 sequences recorded with RGB-D, thermal, and LiDAR sensors, both handheld and on a rover, under darkness and artificial light. Most sequences include 120 Hz six-degree-of-freedom ground-truth poses from an internal motion-capture system. Benchmarks on seven existing SLAM pipelines confirm the data supports immediate algorithm testing.

Core claim

We present CAVERS, a multimodal dataset acquired in two structurally distinct rooms of Cueva de la Victoria, comprising 24 sequences totaling approximately 335 GB of recorded data. The sensor suite combines an Intel RealSense D435i RGB-D-I camera, an Optris PI640i near-IR thermal camera, and a Velodyne VLP-16 LiDAR, operated both handheld and mounted on a wheeled rover under full darkness and artificial illumination. For most of the sequences, mm-accurate 6-DoF ground truth pose and velocity at 120 Hz are provided by an Optirack motion capture system installed directly inside the cave. We benchmark seven state-of-the-art SLAM and odometry algorithms spanning visual, visual-inertial, thermal-

What carries the argument

CAVERS multimodal dataset of synchronized RGB-D, thermal, and LiDAR streams paired with OptiTrack 6-DoF ground-truth poses

If this is right

  • SLAM algorithms can be evaluated on cave-specific challenges such as reflective wet surfaces and zero ambient light.
  • Multimodal fusion methods combining visual, thermal, and LiDAR data can be tested with accurate ground truth.
  • Odometry accuracy in irregular branching passages can be measured directly.
  • 3D reconstruction pipelines can be assessed on natural karst geometry.
  • Data from both handheld and rover-mounted capture is available for different motion profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset may accelerate development of navigation software specialized for search-and-rescue or scientific exploration inside caves.
  • Comparable capture campaigns in other cave systems could build a broader corpus of natural environments.
  • The ground-truth poses allow quantitative study of how each sensor modality improves robustness in darkness.
  • Future releases could add longer connected traversals through multiple rooms and passages.

Load-bearing premise

The OptiTrack motion-capture system inside the cave delivers millimeter-accurate 6-DoF poses at 120 Hz without significant occlusion or calibration drift under the cave's lighting and surface conditions.

What would settle it

Independent alignment of overlapping point clouds from separate sequences revealing that the released OptiTrack trajectories deviate by more than a few millimeters from the actual motion.

Figures

Figures reproduced from arXiv: 2604.15052 by Alfonso Mart\'inez-Petersen, C. J. P\'erez-del-Pulgar, David Rodr\'iguez-Mart\'inez, Giacomo Franchini, Marcello Chiaberge.

Figure 1
Figure 1. Figure 1: Overview of the dataset features: multimodal visual, range, and inertial data (Section II-B) are collected using multiple [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of sensor modalities across different lighting scenarios: LED torch ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The checkerboard target employed for (a) RGB and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The dataset structure. III. EVALUATION A. SLAM and odometry The dataset’s main goal is to provide the robotics commu￾nity with valuable data to test and benchmark the trajectory accuracy of SLAM algorithms in natural underground envi￾ronments. To validate its usability, we evaluated the perfor￾mance of state-of-the-art SLAM and odometry algorithms on our dataset. For visual SLAM, we tested ORBSLAM3 [27] (R… view at source ↗
Figure 5
Figure 5. Figure 5: On the top row, the trajectories estimated from the benchmarked algorithms are reported. For quantitative results, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this environment remain scarce and offer limited sensing modalities and environmental diversity. We present CAVERS, a multimodal dataset acquired in two structurally distinct rooms of Cueva de la Victoria, M\'alaga, Spain, comprising 24 sequences totaling approximately 335 GB of recorded data. The sensor suite combines an Intel RealSense D435i RGB-D-I camera, an Optris PI640i near-IR thermal camera, and a Velodyne VLP-16 LiDAR, operated both handheld and mounted on a wheeled rover under full darkness and artificial illumination. For most of the sequences, mm-accurate 6-DoF ground truth pose and velocity at 120 Hz are provided by an Optirack motion capture system installed directly inside the cave. We benchmark seven state-of-the-art SLAM and odometry algorithms spanning visual, visual-inertial, thermal-inertial, and LiDAR-based pipelines, as well as a 3D reconstruction pipeline, demonstrating the dataset's usability. %The dataset and all supplementary material are publicly available at: https://github.com/spaceuma/cavers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents CAVERS, a multimodal dataset of 24 sequences (~335 GB) collected in two rooms of a natural karstic cave using an Intel RealSense D435i (RGB-D-I), Optris PI640i thermal camera, and Velodyne VLP-16 LiDAR, operated both handheld and on a rover under darkness and artificial light. For most sequences it supplies 6-DoF ground truth at 120 Hz from an in-cave OptiTrack system and benchmarks seven SLAM/odometry algorithms (visual, visual-inertial, thermal-inertial, LiDAR) plus a 3D reconstruction pipeline to show usability.

Significance. If the ground-truth accuracy claim is substantiated, the dataset would fill a genuine gap: publicly available multimodal cave data with irregular geometry, wet reflective surfaces, and near-zero ambient light remain scarce. The combination of thermal, RGB-D, and LiDAR modalities plus rover/handheld modes and public release would enable more realistic SLAM evaluation than existing mine/tunnel datasets. The benchmarking exercise, while preliminary, provides an initial demonstration of utility.

major comments (2)
  1. [Abstract / Ground Truth section] Abstract and Ground-Truth description: the repeated claim of 'mm-accurate 6-DoF ground truth at 120 Hz' from the OptiTrack system is load-bearing for the dataset's value as a SLAM benchmark, yet no cave-specific error metrics (static repeatability, drift over sequence length, occlusion statistics, or cross-validation against another sensor) are supplied. Manufacturer specifications alone do not establish performance under the reported conditions of reflective wet surfaces and potential IR interference from the thermal camera.
  2. [Benchmarking / Results section] Benchmarking section: the statement that seven algorithms were evaluated 'demonstrating the dataset's usability' is not accompanied by any quantitative results (absolute trajectory error, relative pose error, failure rates, or per-modality comparisons). Without these numbers or failure-case statistics the central claim that the data are usable for SLAM research cannot be assessed.
minor comments (3)
  1. [Abstract] Abstract contains the typo 'Optirack' (should be 'OptiTrack').
  2. [Abstract] The GitHub link to the dataset is commented out in the abstract; it should be restored and verified.
  3. [Methods / Sensor Suite] Sensor mounting details, exact calibration procedures between the three modalities, and synchronization timestamps are not described at a level that would allow independent reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the importance of substantiating the ground-truth accuracy and providing quantitative benchmarking results. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses
  1. Referee: [Abstract / Ground Truth section] Abstract and Ground-Truth description: the repeated claim of 'mm-accurate 6-DoF ground truth at 120 Hz' from the OptiTrack system is load-bearing for the dataset's value as a SLAM benchmark, yet no cave-specific error metrics (static repeatability, drift over sequence length, occlusion statistics, or cross-validation against another sensor) are supplied. Manufacturer specifications alone do not establish performance under the reported conditions of reflective wet surfaces and potential IR interference from the thermal camera.

    Authors: We agree that cave-specific validation metrics are necessary to support the mm-accurate ground-truth claim under the reported conditions. The OptiTrack system was calibrated inside the cave, and the sensor suite was arranged to reduce marker occlusion and IR overlap with the thermal camera. However, the manuscript does not report quantitative error analysis. In the revised version we will add a dedicated paragraph to the Ground-Truth section containing: (i) static repeatability measurements performed in the cave, (ii) observed drift statistics over the recorded sequence lengths, (iii) occlusion rates encountered during data collection, and (iv) a brief discussion of potential thermal-camera IR interference together with the mitigation steps taken. These additions will directly address the referee’s concern. revision: yes

  2. Referee: [Benchmarking / Results section] Benchmarking section: the statement that seven algorithms were evaluated 'demonstrating the dataset's usability' is not accompanied by any quantitative results (absolute trajectory error, relative pose error, failure rates, or per-modality comparisons). Without these numbers or failure-case statistics the central claim that the data are usable for SLAM research cannot be assessed.

    Authors: We acknowledge that the current text presents the benchmarking exercise largely qualitatively. Although the seven algorithms were run on the sequences, the manuscript does not tabulate the resulting error metrics. In the revised manuscript we will expand the Benchmarking section with a table reporting Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) for each algorithm and modality, together with failure rates and notes on the most challenging sequences (e.g., low-light or highly reflective surfaces). This will supply the concrete quantitative evidence needed to evaluate the dataset’s utility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset paper with no derivation chain

full rationale

The paper presents a multimodal cave dataset and benchmarks existing SLAM algorithms on it. No mathematical derivations, parameter fittings, predictions, or self-referential models are claimed. Ground-truth pose is supplied by an external commercial motion-capture system; the paper states its use without deriving or fitting the accuracy figure internally. Benchmarking results are direct empirical outputs on the released data, not reductions to fitted inputs. The contribution is self-contained data collection and public release, with no load-bearing steps that collapse by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is a dataset release rather than a theoretical model, so it introduces no free parameters or invented entities; it rests on standard assumptions about sensor calibration and motion-capture accuracy.

axioms (1)
  • domain assumption The OptiTrack motion-capture system installed directly inside the cave supplies mm-accurate 6-DoF pose and velocity at 120 Hz for most sequences.
    Explicitly stated in the abstract as the source of ground truth.

pith-pipeline@v0.9.0 · 5568 in / 1306 out tokens · 49471 ms · 2026-05-10T11:06:51.110565+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    D. C. Culver and T. Pipan,The Biology of Caves and Other Subter- ranean Habitats. Oxford University Press, 04 2019

  2. [2]

    Into the unknown: Microbial communities in caves, their role, and potential use,

    K. Kosznik-Kwa ´snicka, P. Golec, W. Jaroszewicz, D. Lubomska, and L. Piechowicz, “Into the unknown: Microbial communities in caves, their role, and potential use,”Microorganisms, vol. 10, no. 2, p. 222, 2022

  3. [3]

    Ford and P

    D. Ford and P. Williams,Karst Water Resources Management. John Wiley & Sons, Ltd, 2007, ch. 11, pp. 441–469

  4. [4]

    Autonomous cave surveying with an aerial robot,

    W. Tabib, K. Goel, J. Yao, C. Boirum, and N. Michael, “Autonomous cave surveying with an aerial robot,”IEEE Transactions on Robotics, vol. 38, no. 2, pp. 1016–1032, 2022

  5. [6]

    Field-hardened robotic autonomy for subterranean exploration,

    T. Dang, F. Mascarich, S. Khattak, H. Nguyen, N. Khedekar, C. Pa- pachristos, and K. Alexis, “Field-hardened robotic autonomy for subterranean exploration,” inField and Service Robotics, 08 2019

  6. [7]

    The DARPA subterranean challenge: A synopsis of the circuits stage,

    V . L. Orekhov and T. H. Chung, “The DARPA subterranean challenge: A synopsis of the circuits stage,”Field Robotics, vol. 2, pp. 735–747, 2022

  7. [8]

    CERBERUS in the DARPA Subterranean Challenge,

    M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart, and K. Alexis, “CERBERUS in the DARPA Subterranean Challenge,” Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

  8. [9]

    Present and future of SLAM in extreme environments: The DARPA SubT challenge,

    K. Ebadi, L. Bernreiter, H. Biggie, G. Catt, Y . Chang, A. Chatterjee, C. E. Denniston, S.-P. Deschˆenes, K. Harlow, S. Khattak, L. Nogueira, M. Palieri, P. Petr´aˇcek, M. Petrl´ık, A. Reinke, V . Kr´atk´y, S. Zhao, A.-a. Agha-mohammadi, K. Alexis, C. Heckman, K. Khosoussi, N. Kottege, B. Morrell, M. Hutter, F. Pauling, F. Pomerleau, M. Saska, S. Scherer,...

  9. [10]

    Vision-LiDAR-inertial localization and mapping dataset of a mining cave,

    Y . Zhou, S. Zhu, and Y . Li, “Vision-LiDAR-inertial localization and mapping dataset of a mining cave,” inIntelligent Robotics and Applications, H. Yang, H. Liu, J. Zou, Z. Yin, L. Liu, G. Yang, X. Ouyang, and Z. Wang, Eds. Singapore: Springer Nature Singapore, 2023, pp. 411–422

  10. [11]

    LOCUS 2.0: Robust and computationally efficient lidar odometry for real-time 3D mapping,

    A. Reinke, M. Palieri, B. Morrell, Y . Chang, K. Ebadi, L. Carlone, and A.-A. Agha-Mohammadi, “LOCUS 2.0: Robust and computationally efficient lidar odometry for real-time 3D mapping,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9043–9050, 2022

  11. [12]

    Chilean underground mine dataset,

    K. Leung, D. L ¨uhr, H. Houshiar, F. Inostroza, D. Borrmann, M. Adams, A. N ¨uchter, and J. R. del Solar, “Chilean underground mine dataset,”The International Journal of Robotics Research, vol. 36, no. 1, pp. 16–23, 2017

  12. [13]

    MIN3D dataset: Multi-sensor 3D mapping with an unmanned ground vehicle,

    P. Trybała, J. Szrek, F. Remondino, P. Kujawa, J. Wodecki, J. Bla- chowski, and R. Zimroz, “MIN3D dataset: Multi-sensor 3D mapping with an unmanned ground vehicle,”PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 91, no. 6, pp. 425– 442, 12 2023

  13. [14]

    Dataset collection from a SubT environment,

    A. Koval, S. Karlsson, S. S. Mansouri, C. Kanellakis, I. Tevetzidis, J. Haluska, A. akbar Agha-mohammadi, and G. Nikolakopoulos, “Dataset collection from a SubT environment,”Robotics and Au- tonomous Systems, vol. 155, p. 104168, 2022

  14. [15]

    Test your SLAM! the SubT-tunnel dataset and metric for mapping,

    J. G. Rogers, J. M. Gregory, J. Fink, and E. Stump, “Test your SLAM! the SubT-tunnel dataset and metric for mapping,” in2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 955–961

  15. [16]

    CERBERUS in the DARPA subterranean challenge,

    M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart, and K. Alexis, “CERBERUS in the DARPA subterranean challenge,” Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

  16. [17]

    SubT-MRS dataset: Pushing SLAM towards all- weather environments,

    S. Zhao, Y . Gao, T. Wu, D. Singh, R. Jiang, H. Sun, M. Sarawata, Y . Qiu, W. Whittaker, I. Higgins, Y . Du, S. Su, C. Xu, J. Keller, J. Karhade, L. Nogueira, S. Saha, J. Zhang, W. Wang, C. Wang, and S. Scherer, “SubT-MRS dataset: Pushing SLAM towards all- weather environments,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...

  17. [18]

    Multimodal dataset from harsh sub-terranean environment with aerosol particles for frontier explo- ration,

    A. Kyuroson, N. Dahlquist, N. Stathoulopoulos, V . K. Viswanathan, A. Koval, and G. Nikolakopoulos, “Multimodal dataset from harsh sub-terranean environment with aerosol particles for frontier explo- ration,” in2023 31st Mediterranean Conference on Control and Automation (MED), 2023, pp. 716–721

  18. [19]

    A benchmark for visual- inertial odometry systems employing onboard illumination,

    M. Kasper, S. McGuire, and C. Heckman, “A benchmark for visual- inertial odometry systems employing onboard illumination,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5256–5263

  19. [20]

    Fundamental science and engineering questions in plane- tary cave exploration,

    J. J. Wynne, T. N. Titus, A.-a. Agha-Mohammadi, A. Azua-Bustos, P. J. Boston, P. de Le ´on, C. Demirel-Floyd, J. De Waele, H. Jones, M. J. Malaska, A. Z. Miller, H. M. Sapers, F. Sauro, D. L. Sonderegger, K. Uckert, U. Y . Wong, E. C. Alexander Jr., L. Chiao, G. E. Cushing, J. DeDecker, A. G. Fair ´en, A. Frumkin, G. L. Harris, M. L. Kearney, L. Kerber, R...

  20. [21]

    Robotic lava tube mapping and multimodal data collection using quadruped and LiDAR,

    A. J. Hidding, L. Peternel, A. J. Becoy, F. Romio, and G. C. Calabrese, “Robotic lava tube mapping and multimodal data collection using quadruped and LiDAR,” May 2025

  21. [22]

    NASA planetary pits and caves analog dataset,

    U. Wong, W. Whittaker, H. Jones, and R. Whittaker, “NASA planetary pits and caves analog dataset,” Dec. 2014

  22. [23]

    Large-scale exploration of cave environments by unmanned aerial vehicles,

    P. Petr ´aˇcek, V . Kr ´atk´y, M. Petrl ´ık, T. B ´aˇca, R. Kratochv ´ıl, and M. Saska, “Large-scale exploration of cave environments by unmanned aerial vehicles,”IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7596–7603, 2021

  23. [24]

    BASEPROD: The Bardenas semi-desert planetary rover dataset,

    L. Gerdes, T. Wiese, R. C. Arquillo, L. Bielenberg, M. Azkarate, H. Leblond, F. Wilting, J. O. Cort ´es, A. Bernal, S. Palanco, and C. P. del Pulgar, “BASEPROD: The Bardenas semi-desert planetary rover dataset,”Scientific Data, vol. 11, p. 1054, 9 2024

  24. [25]

    A flexible new technique for camera calibration,

    Z. Zhang, “A flexible new technique for camera calibration,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000

  25. [26]

    A four-step camera calibration procedure with implicit image correction,

    J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit image correction,” inProceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997, pp. 1106–1112

  26. [27]

    ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,

    C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,”IEEE Transactions on Robotics, vol. 37, no. 6, p. 1874–1890, Dec. 2021

  27. [28]

    RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,

    M. Labb ´e and F. Michaud, “RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,”Journal of Field Robotics, vol. 36, no. 2, p. 416–446, Oct. 2018

  28. [29]

    ROVTIO: Robust visual thermal inertial odometry,

    H. D. Flemmen, “ROVTIO: Robust visual thermal inertial odometry,” Master’s thesis, Norwegian University of Science and Technology, 2021

  29. [30]

    KISS-ICP: In defense of point-to-point ICP – simple, accurate, and robust registration if done the right way,

    I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “KISS-ICP: In defense of point-to-point ICP – simple, accurate, and robust registration if done the right way,”IEEE Robotics and Automation Letters, vol. 8, no. 2, p. 1029–1036, Feb. 2023

  30. [31]

    GenZ-ICP: Generalizable and degeneracy-robust LiDAR odometry using an adaptive weighting,

    D. Lee, H. Lim, and S. Han, “GenZ-ICP: Generalizable and degeneracy-robust LiDAR odometry using an adaptive weighting,” IEEE Robotics and Automation Letters, vol. 10, no. 1, p. 152–159, Jan. 2025

  31. [32]

    A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry,

    Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry,” in2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7244–7251

  32. [33]

    High-fidelity 3D reconstruction for planetary ex- ploration,

    A. Mart ´ınez-Petersen, L. Gerdes, D. Rodr ´ıguez-Mart´ınez, and C. J. P´erez-del Pulgar, “High-fidelity 3D reconstruction for planetary ex- ploration,” inProceedings of the 2026 IEEE Conference on Artificial Intelligence (CAI), 2026, to appear

  33. [34]

    Splatfacto-W: A nerfstudio imple- mentation of gaussian splatting for unconstrained photo collections,

    C. Xu, J. Kerr, and A. Kanazawa, “Splatfacto-W: A nerfstudio imple- mentation of gaussian splatting for unconstrained photo collections,” 2024

  34. [35]

    Structure-from-motion revisited,

    J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition (CVPR), 2016