pith. sign in

arxiv: 2606.04569 · v1 · pith:JUKR75SYnew · submitted 2026-06-03 · 💻 cs.RO

MineXplore: An Open-Source Reinforcement Learning Exploration Benchmark for GNSS-Denied Underground Environment

Pith reviewed 2026-06-28 06:17 UTC · model grok-4.3

classification 💻 cs.RO
keywords reinforcement learningunderground navigationMuJoCo benchmarkexplorationGNSS-denied environmentsrobotics simulationmine mapping
0
0 comments X

The pith

MineXplore converts real underground mine survey data into a MuJoCo environment for training reinforcement learning navigation policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MineXplore as an open-source benchmark that reconstructs a 104,423 square meter Chilean copper mine tunnel network for reinforcement learning experiments. It builds the simulation from existing survey contours through a six-stage process that adds realistic wall shapes, terrain variations, incline, and lighting. Geometric accuracy reaches an IoU of 0.9538 against the source map, and a PPO agent trained across five seeds achieves up to 88.89 percent rolling coverage with three runs hitting the 90 percent target. This setup fills a gap for testing autonomous robots in GPS-denied, loop-rich underground spaces where no comparable open benchmark existed.

Core claim

MineXplore is a MuJoCo-based navigation benchmark derived from the Leung et al. 2017 dataset via a six-stage contour-to-MJCF pipeline that produces octagonal walls, LiDAR jagged geometry, three friction zones, a global 5 degree incline, and periodic spot lighting, validated at 0.9538 IoU and 79.4 percent surface similarity, and shown to support stable PPO policy learning with a best rolling coverage of 88.89 percent across five random seeds.

What carries the argument

The six-stage contour-to-MJCF pipeline that turns survey contours into a MuJoCo model incorporating realistic tunnel cross-sections, jagged walls, friction zones, incline, and lighting.

If this is right

  • The benchmark enables reproducible evaluation of single-agent exploration policies in non-convex, loop-rich tunnel networks.
  • Compatibility with RLlib supports GPU-accelerated training runs that produce stable results across random seeds.
  • The environment provides a standardized testbed for GNSS-denied navigation under realistic underground sensing conditions.
  • High IoU and surface similarity scores establish a baseline for comparing future model variants or sensor additions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reconstruction pipelines could be applied to other mine survey datasets to expand the set of available underground benchmarks.
  • Policies trained here could be tested for transfer to physical robots operating in comparable real tunnels.
  • The setup allows direct comparison of different reinforcement learning algorithms or multi-agent variants within the same fixed environment.

Load-bearing premise

The converted simulation model matches the real mine's geometry and surface properties closely enough for reinforcement learning policies to be meaningful.

What would settle it

Retraining the PPO baseline after deliberately lowering the model's geometric IoU and checking whether the 88.89 percent coverage result disappears would test whether the fidelity level is required for the reported learning outcome.

Figures

Figures reproduced from arXiv: 2606.04569 by Abhishek S, Badrikanath Praharaj, Sreeram MV.

Figure 1
Figure 1. Figure 1: Point cloud visualisation from the Chilean underground mine survey. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A. Source Data and Scale Calibration (Stage 1) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Published 2D survey floor-plan of the Chilean underground copper [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Detected contours overlaid on the binarised survey map. Outer tunnel [PITH_FULL_IMAGE:figures/full_fig_p002_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: MineXplore compilation pipeline from real Chilean mine survey data to a validated MJCF environment. Stage 1: 2D survey floor-plan and LiDAR [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Compiled MineXplore environment in the MuJoCo interactive viewer. [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-axis surface-texture similarity comparison between the compiled [PITH_FULL_IMAGE:figures/full_fig_p004_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Pixel-wise geometric fidelity overlay. Green pixels indicate agreement; [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Coverage fraction over training (mean ± 1 SD across five seeds; dotted line: 90% target). Mean rises monotonically from ∼5% to a best rolling average of 88.89% ± 1.74%; higher is better [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Per-episode collision rate over training (mean [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Cumulative episode reward over training (mean [PITH_FULL_IMAGE:figures/full_fig_p006_11.png] view at source ↗
read the original abstract

Underground mines present extreme conditions for autonomous robot navigation: GPS is denied, lighting is degraded, and tunnel topology is loop-rich and non-convex. Simulation benchmarks grounded in real production-mine geometry and compatible with GPU-accelerated learning pipelines do not yet exist in the open-source ecosystem. We present MineXplore, an open-source MuJoCo-based navigation benchmark derived from the Leung et al. 2017 Chilean underground copper mine dataset. The environment reconstructs a 104,423 sq.m tunnel network through an six-stage contour-to-MJCF pipeline incorporating octagonal wall cross-sections, LiDAR-sourced jagged wall geometry, three terrain friction zones, a global 5 degree incline, and periodic spot lighting. Geometric fidelity is validated at an Intersection over Union (IoU) of 0.9538 against the source survey map, and surface texture similarity scores 79.4% across six structural dimensions. A single-agent PPO baseline trained via RLlib across five independent random seeds achieves a best rolling coverage of 88.89% (3 of 5 seeds reaching the 90% coverage target), confirming that MineXplore supports stable and reproducible policy learning under realistic underground sensing and topology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MineXplore, an open-source MuJoCo-based benchmark for RL exploration in GNSS-denied underground mines. Derived from the Leung et al. 2017 dataset via a six-stage contour-to-MJCF pipeline, it includes octagonal cross-sections, jagged walls, friction zones, incline, and lighting. Validation shows IoU of 0.9538 and 79.4% texture similarity. A PPO baseline with RLlib across 5 seeds achieves 88.89% best rolling coverage, with 3 seeds reaching 90%.

Significance. If the simulation fidelity is adequate, this benchmark fills an important gap in open-source tools for realistic underground robot navigation research. The grounding in real survey data and compatibility with RLlib are notable strengths that could facilitate reproducible studies on exploration policies in challenging environments.

major comments (2)
  1. [Abstract] Abstract: The claim that the PPO baseline confirms MineXplore supports 'stable and reproducible policy learning under realistic underground sensing and topology' depends on simulation fidelity, yet the reported validation (IoU 0.9538, 79.4% texture similarity) addresses only static geometry and surface properties; no evidence is given that dynamics, connectivity, or sensor behavior match the source survey sufficiently to rule out artifacts.
  2. [Abstract] Abstract / baseline results: The 'best rolling coverage of 88.89%' is obtained by post-hoc selection across seeds, with only 3 of 5 reaching the 90% target; this selection procedure and the absence of reported statistical significance testing or full hyperparameter details undermine the reproducibility and stability claims.
minor comments (2)
  1. [Abstract] The six-stage pipeline is referenced but the individual stages are not listed in the abstract; a concise enumeration or pointer to the methods section would improve clarity.
  2. The open-source release is a strength; ensure the repository includes exact environment configuration files, random seeds, and training scripts matching the reported baseline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript accordingly to improve clarity on validation scope and baseline reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the PPO baseline confirms MineXplore supports 'stable and reproducible policy learning under realistic underground sensing and topology' depends on simulation fidelity, yet the reported validation (IoU 0.9538, 79.4% texture similarity) addresses only static geometry and surface properties; no evidence is given that dynamics, connectivity, or sensor behavior match the source survey sufficiently to rule out artifacts.

    Authors: We agree that the reported validation is limited to static geometric (IoU 0.9538) and textural (79.4%) fidelity derived from the Leung et al. 2017 survey. No direct validation of dynamics, connectivity, or sensor models against the source data is provided. The abstract claim regarding 'realistic underground sensing and topology' will be revised to specify that the benchmark is grounded in real geometry and topology while noting the absence of dynamic validation, to avoid overstating fidelity. revision: yes

  2. Referee: [Abstract] Abstract / baseline results: The 'best rolling coverage of 88.89%' is obtained by post-hoc selection across seeds, with only 3 of 5 reaching the 90% target; this selection procedure and the absence of reported statistical significance testing or full hyperparameter details undermine the reproducibility and stability claims.

    Authors: The referee correctly identifies that the reported 'best rolling coverage' reflects post-hoc selection across the five seeds, with only three reaching the 90% target. We will revise the abstract to report mean coverage and standard deviation across seeds instead, and we will add full hyperparameter details and any available statistical tests to the methods section to support reproducibility claims. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark derived from external data with standard RL baseline

full rationale

The paper constructs MineXplore from the external Leung et al. 2017 survey dataset via a six-stage contour-to-MJCF pipeline, reports geometric validation (IoU 0.9538) against that source, and evaluates a standard off-the-shelf PPO implementation from RLlib across random seeds. No equations, fitted parameters, or predictions reduce to the paper's own inputs by construction; the central claim rests on external data and unmodified algorithms rather than self-definition or self-citation chains. This is the most common honest non-finding for benchmark papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard MuJoCo physics, an external survey dataset, and off-the-shelf PPO.

pith-pipeline@v0.9.1-grok · 5754 in / 1058 out tokens · 31392 ms · 2026-06-28T06:17:46.096855+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Chilean underground mine dataset,

    K. Y . K. Leung, D. Luhr, H. Houshiar, F. Inostroza, D. Borrmann, M. Adams, A. N ¨uchter, and J. Ruiz-del-Solar, “Chilean underground mine dataset,”The International Journal of Robotics Research, vol. 36, no. 1, pp. 16–23, 2017. DOI: 10.1177/0278364916679497

  2. [2]

    CERBERUS in the DARPA Subterranean Challenge,

    M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwart, and K. Alexis, “CERBERUS in the DARPA Subterranean Challenge,” Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

  3. [3]

    Present and future of SLAM in extreme environments: The DARPA SubT Challenge,

    K. Ebadi, L. Bernreiter, H. Biggie, G. Catt, Y . Changet al., “Present and future of SLAM in extreme environments: The DARPA SubT Challenge,”IEEE Transactions on Robotics, vol. 40, pp. 936–959, 2024

  4. [4]

    DARPA SubT Virtual Testbed,

    Open Robotics, “DARPA SubT Virtual Testbed,” 2021. [Online]. Avail- able: https://github.com/osrf/subt

  5. [5]

    Benchmarking metric ground navigation,

    D. Perille, A. Truong, X. Xiao, and P. Stone, “Benchmarking metric ground navigation,” inProc. IEEE Int. Symp. Safety, Security, and Rescue Robotics (SSRR), 2020

  6. [6]

    Isaac Gym: High performance GPU based physics simulation for robot learning,

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac Gym: High performance GPU based physics simulation for robot learning,” in NeurIPS Datasets and Benchmarks Track, 2021

  7. [7]

    arXiv preprint arXiv:2502.08844 , year=

    K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, C. Sferrazza, Y . Tassa, and P. Abbeel, “MuJoCo Playground,”arXiv preprint arXiv:2502.08844, 2025

  8. [8]

    Navigation in underground mine environ- ments: A simulation framework for quadruped robots,

    Y . Gao and K. Awuah-Offei, “Navigation in underground mine environ- ments: A simulation framework for quadruped robots,” inProc. IEEE Int. Conf. Automation Science and Engineering (CASE), 2025, pp. 1464– 1469

  9. [9]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proxi- mal Policy Optimization Algorithms,”arXiv preprint arXiv:1707.06347, 2017

  10. [10]

    Gymnasium: A Standard Interface for Reinforcement Learning Environments

    M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goul ˜ao, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, A. Pierr´e, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymnasium: A Standard Interface for Reinforcement Learning Environments,”arXiv preprint arXiv:2407.17032, 2024

  11. [11]

    RLlib: Abstractions for Distributed Reinforcement Learning,

    E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gon- zalez, M. Jordan, and I. Stoica, “RLlib: Abstractions for Distributed Reinforcement Learning,” inProc. 35th Int. Conf. Machine Learning (ICML), 2018, pp. 3053–3062

  12. [12]

    MuJoCo: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” inProc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), 2012, pp. 5026–5033

  13. [13]

    A frontier-based approach for autonomous exploration,

    B. Yamauchi, “A frontier-based approach for autonomous exploration,” inProc. IEEE Int. Symp. Computational Intelligence in Robotics and Automation (CIRA), 1997, pp. 146–151

  14. [14]

    Policy invariance under reward transformations: Theory and application to reward shaping,

    A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inProc. 16th Int. Conf. Machine Learning (ICML), 1999, pp. 278–287

  15. [15]

    Unifying count-based exploration and intrinsic motivation,

    M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying count-based exploration and intrinsic motivation,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 29, 2016