pith. sign in

arxiv: 2604.22482 · v2 · pith:ZIWHXRARnew · submitted 2026-04-24 · 💻 cs.CV · cs.GR

Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond

Pith reviewed 2026-05-08 12:26 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords panoramic 3D reconstructioncontinuous trajectoriesdepth maps360 datasetSLAMlaser scanningbenchmarkfeed-forward models
0
0 comments X

The pith

Holo360D supplies the first large-scale dataset of continuous panoramic sequences with aligned high-completeness depth maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Holo360D to fix a key gap: existing panoramic 3D datasets capture 360 images only from fixed, separate locations, so they lack the continuous trajectories needed for multi-view training. Current feed-forward reconstruction models already lose accuracy on panoramas because of spherical distortions, and the missing continuity makes multi-view learning even harder. The new dataset records 109,495 panoramas along smooth paths using a 3D laser scanner paired with a 360 camera, then runs online and offline SLAM followed by geometry denoising, mesh hole filling, and region-specific remeshing to produce registered point clouds, meshes, and depth maps. Fine-tuning experiments on Holo360D show that models receive stronger training signals and establish a practical benchmark for panoramic 3D work.

Core claim

Holo360D is the first large-scale real-world dataset that supplies continuous panoramic sequences paired with accurately aligned high-completeness depth maps, registered point clouds, meshes, and camera poses. Raw data are captured with a 3D laser scanner and 360 camera, refined through SLAM systems, and cleaned by a post-processing pipeline of geometry denoising, mesh hole filling, and region-specific remeshing. Fine-tuning 3D reconstruction models on the dataset yields superior training signals compared with prior discrete-location collections.

What carries the argument

The Holo360D dataset of continuous panoramic sequences with SLAM-aligned high-completeness depth maps produced by laser scanning and a tailored post-processing pipeline.

If this is right

  • Panoramic feed-forward 3D reconstruction models gain stronger multi-view training signals from continuous trajectories.
  • The dataset functions as a standardized benchmark for evaluating and advancing panoramic 3D reconstruction methods.
  • Fine-tuned models exhibit improved handling of spherical distortions when trained on the aligned depth maps.
  • Public release of the data and code supports further development of related panoramic vision applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The focus on trajectory continuity implies that similar capture and processing choices could improve datasets for other wide-field sensors.
  • Better panoramic reconstruction models trained this way may directly aid indoor mapping and navigation systems that use consumer 360 cameras.
  • The post-processing steps could be tested on other large-scale 3D capture projects to reduce artifacts in depth maps.

Load-bearing premise

The post-processing pipeline of geometry denoising, mesh hole filling, and region-specific remeshing combined with online and offline SLAM produces sufficiently accurate alignments and high-completeness depth maps without major artifacts or biases.

What would settle it

If models fine-tuned on Holo360D show no accuracy improvement or even worse results on panoramic 3D reconstruction tasks than models trained on existing discrete panoramic datasets, the claim of superior training signals would be falsified.

Figures

Figures reproduced from arXiv: 2604.22482 by Hui Xiong, Jing Ou, Jinjing Zhu, Shuai Zhang, Tongyan Hua, Wufan Zhao, Yinrui Ren, Zhuoxiao Li, Zidong Cao.

Figure 1
Figure 1. Figure 1: We present Holo360D, the first large-scale real-world panoramic 3D dataset, containing 109,495 panoramas paired with LiDAR￾derived ground truth, including precise meshes, point clouds, depth maps, and camera poses. More importantly, Holo360D is the first panoramic dataset to offer accurately aligned high-completeness depth maps with continuous camera trajectories over long sequences. Abstract While feed-fo… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of depth maps across different panoramic view at source ↗
Figure 3
Figure 3. Figure 3: Dataset creation pipeline consisting of (i) data collection, (ii) offline reconstruction, and (iii) data post-processing. view at source ↗
Figure 4
Figure 4. Figure 4: Data post-processing pipeline consisting of (i) data denoising, (ii) mesh hole filling, and (iii) region-specific remeshing. view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of reconstructed mesh models on Matterport3D and Holo360D. Holo360D meshes exhibit higher completeness in view at source ↗
Figure 6
Figure 6. Figure 6: Reference dimensions used to evaluate point cloud re view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of fine-tuning performance with different view at source ↗
Figure 8
Figure 8. Figure 8: Visualization results comparing different view configurations and depth supervision types. view at source ↗
Figure 9
Figure 9. Figure 9: View decomposition strategies. The 8 views consists view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of baseline models fine-tuned on Holo360D. The blue arrows indicate viewpoints selected for zoom-in views. view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of reconstructions in glass regions before and after finetuning. The finetuned view at source ↗
Figure 12
Figure 12. Figure 12: Finetuning π 3 on different datasets. Fine-tuning on Holo360D enables more accurate and complete reconstruction re￾sults than finetuning on Matterport3D. its re-rendered version also offers continuous three-view sequences, making it an ideal reference for comparison. As shown in view at source ↗
Figure 13
Figure 13. Figure 13: Challenging scenes. Our dataset includes (a) low-texture and repetitive-texture scenes, (b) large, long-sequence scenes, and (c) view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of single-frame point clouds. view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative comparison of sparse-view panoramic 3D reconstruction results. After finetuning with our dataset, the view at source ↗
Figure 16
Figure 16. Figure 16: Qualitative comparison of single-view panoramic 3D reconstruction results. After finetuning with our dataset, the model view at source ↗
Figure 17
Figure 17. Figure 17: Comparison between the advanced panoramic monocular depth estimation models (DA view at source ↗
Figure 18
Figure 18. Figure 18: Degradation of reconstruction quality in distant regions. view at source ↗
read the original abstract

While feed-forward 3D reconstruction models have advanced rapidly, they still exhibit degraded performance on panoramas due to spherical distortions. Moreover, existing panoramic 3D datasets are predominantly collected with 360 cameras fixed at discrete locations, resulting in discontinuous trajectories. These limitations critically hinder the development of panoramic feed-forward 3D reconstruction, especially for the multi-view setting. In this paper, we present Holo360D, a comprehensive dataset containing 109,495 panoramas paired with registered point clouds, meshes, and aligned camera poses. To our knowledge, Holo360D is the first large-scale dataset that provides continuous panoramic sequences with accurately aligned high-completeness depth maps. The raw data are initially collected using a 3D laser scanner coupled with a 360 camera. Subsequently, the raw data are processed with both online and offline SLAM systems. Furthermore, to enhance the 3D data quality, a post-processing pipeline tailored for the 360 dataset is proposed, including geometry denoising, mesh hole filling, and region-specific remeshing. Finally, we establish a new benchmark by fine-tuning 3D reconstruction models on Holo360D, providing key insights into effective fine-tuning strategies. Our results demonstrate that Holo360D delivers superior training signals and provides a comprehensive benchmark for advancing panoramic 3D reconstruction models. Datasets and Code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Holo360D, a dataset of 109,495 continuous panoramic sequences paired with registered point clouds, meshes, and camera poses. Raw data are captured with a 3D laser scanner and 360° camera, then processed via online and offline SLAM followed by a post-processing pipeline (geometry denoising, mesh hole filling, region-specific remeshing). The authors fine-tune existing 3D reconstruction models on the dataset to create a benchmark and claim that Holo360D supplies superior training signals for panoramic feed-forward reconstruction.

Significance. A high-quality, large-scale continuous-trajectory panoramic dataset with aligned depth would address a clear gap in multi-view 3D reconstruction research, where existing datasets are limited to discrete viewpoints. Public release of data and code is a concrete strength that could enable reproducible progress on spherical-distortion handling.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Benchmark): the central claim that Holo360D 'delivers superior training signals' rests on fine-tuning results, yet no quantitative metrics (RMSE, completeness percentages, alignment error distributions, or controlled ablations against prior panoramic datasets) are reported. This absence directly undermines verification of the 'accurately aligned high-completeness depth maps' assertion.
  2. [§3.3] §3.3 (Post-processing pipeline): the description of geometry denoising, hole filling, and region-specific remeshing contains no before/after quantitative validation or error analysis. Given that the pipeline is load-bearing for the 'high-completeness' and 'artifact-free' properties, the lack of such evidence leaves the weakest assumption untested.
minor comments (2)
  1. [Abstract] The abstract states 'Datasets and Code will be made publicly available' without a specific URL or repository link; this should be added for reproducibility.
  2. [§2] Notation for 'continuous trajectories' versus 'discontinuous' baselines could be clarified with a short diagram or table comparing trajectory properties across datasets.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the potential value of Holo360D in addressing gaps in panoramic 3D reconstruction. We address each major comment below and will revise the manuscript accordingly to provide stronger quantitative support.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Benchmark): the central claim that Holo360D 'delivers superior training signals' rests on fine-tuning results, yet no quantitative metrics (RMSE, completeness percentages, alignment error distributions, or controlled ablations against prior panoramic datasets) are reported. This absence directly undermines verification of the 'accurately aligned high-completeness depth maps' assertion.

    Authors: We agree that the current version of the manuscript does not include the requested quantitative metrics or ablations in §4. In the revised manuscript we will expand the benchmark section to report RMSE, completeness percentages, alignment error distributions, and controlled comparisons against prior panoramic datasets. These additions will directly substantiate the claim of superior training signals and allow verification of the alignment and completeness properties. revision: yes

  2. Referee: [§3.3] §3.3 (Post-processing pipeline): the description of geometry denoising, hole filling, and region-specific remeshing contains no before/after quantitative validation or error analysis. Given that the pipeline is load-bearing for the 'high-completeness' and 'artifact-free' properties, the lack of such evidence leaves the weakest assumption untested.

    Authors: We concur that quantitative before-and-after validation is required for the post-processing pipeline. We will augment §3.3 with error analyses and metrics quantifying the effects of geometry denoising, hole-filling success rates, and region-specific remeshing accuracy. These additions will provide concrete evidence supporting the high-completeness and artifact-free characteristics of the final data. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical dataset construction effort using laser scanning, 360 cameras, online/offline SLAM, and a post-processing pipeline of denoising, hole filling, and remeshing. No equations, parameter fittings, or mathematical derivations are described that could reduce to self-defined inputs or fitted quantities by construction. Claims of being the 'first large-scale dataset' with continuous trajectories and high-completeness depth maps rest on the described data collection process rather than any self-referential logic, self-citation chains, or renamed known results. The contribution is data release and benchmarking, with no load-bearing steps that collapse into their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the unverified accuracy of the custom post-processing pipeline and SLAM alignment for producing usable training data; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Online and offline SLAM systems can produce accurate camera pose alignment between 360 images and laser-scanned point clouds in real-world environments.
    Invoked in the data processing step described in the abstract.
  • domain assumption The proposed post-processing steps (denoising, hole filling, region-specific remeshing) improve 3D data quality without introducing new errors that affect downstream model training.
    Stated as part of the pipeline to enhance 3D data quality.

pith-pipeline@v0.9.0 · 5587 in / 1591 out tokens · 37098 ms · 2026-05-08T12:26:46.147137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage

    cs.CV 2026-05 unverdicted novelty 6.0

    Presents COVER, a greedy ERP viewpoint curator with coverage scoring and depth conflict penalization, and releases the CM-EVS dataset of 36k sparse panoramic RGB-D-pose frames from 1,275 indoor scenes plus outdoor data.