Multi-modal panoramic 3D outdoor datasets for place categorization

Hojung Jung; Oscar M. Mozos; Ryo Kurazume; Yuki Oto; Yumi Iwashita

arxiv: 2604.13142 · v1 · submitted 2026-04-14 · 💻 cs.RO · cs.CV· cs.DB

Multi-modal panoramic 3D outdoor datasets for place categorization

Hojung Jung , Yuki Oto , Oscar M. Mozos , Yumi Iwashita , Ryo Kurazume This is my paper

Pith reviewed 2026-05-10 14:45 UTC · model grok-4.3

classification 💻 cs.RO cs.CVcs.DB

keywords panoramic 3D datasetsplace categorizationoutdoor environmentslaser scanningpoint cloudssemantic classificationmulti-modal datarobotics datasets

0 comments

The pith

Two multi-modal panoramic 3D datasets support up to 96 percent accurate outdoor place categorization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases two public datasets of panoramic 3D scans collected in Fukuoka, Japan, to enable semantic categorization of places into six types: forest, coast, residential area, urban area, and indoor or outdoor parking lots. One dataset supplies 650 dense scans, each with about nine million points plus synchronized color images from a stationary laser scanner. The second supplies over 34,000 sparser real-time scans of about seventy thousand reflectance points each, captured while driving. Tests of several classification approaches on these data reach 96.42 percent accuracy with the dense scans and 89.67 percent with the sparse scans. A sympathetic reader would care because robots and vehicles operating outdoors need reliable ways to recognize what kind of surroundings they occupy so they can choose appropriate behaviors.

Core claim

We present two multi-modal panoramic 3D outdoor (MPO) datasets for semantic place categorization with six categories. The first consists of 650 static panoramic scans of dense 3D color and reflectance point clouds obtained with a FARO laser scanner. The second consists of 34,200 real-time panoramic scans of sparse 3D reflectance point clouds obtained with a Velodyne laser scanner while driving. The datasets are publicly available, and several approaches achieve best results of 96.42 percent accuracy on the dense data and 89.67 percent on the sparse data.

What carries the argument

The MPO datasets of dense color-and-reflectance panoramic point clouds and sparse reflectance panoramic point clouds, which serve as training and test material for place categorization classifiers across the six categories.

If this is right

The dense dataset supplies high-resolution data suitable for detailed offline analysis of place features.
The sparse dataset supports real-time categorization while a vehicle is in motion.
The six categories create a concrete benchmark for distinguishing natural landscapes from built environments using 3D data.
Public release of both datasets lets other researchers train, test, and compare new categorization methods without new data collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The accuracy gap between dense and sparse scans suggests that future systems could trade sensor density for speed depending on the application.
The datasets could be combined with other sensor types such as cameras to test whether multi-modal fusion further improves robustness.
Extending the same scanning protocol to additional cities or seasons would test whether the categorization remains stable across different geographic conditions.

Load-bearing premise

The collected scans contain enough distinctive geometric and reflectance information to allow reliable separation of the six place categories.

What would settle it

Running a standard classifier on the publicly released datasets and obtaining accuracy close to the chance level of roughly 17 percent for six categories would show that the scans do not support the claimed categorization performance.

Figures

Figures reproduced from arXiv: 2604.13142 by Hojung Jung, Oscar M. Mozos, Ryo Kurazume, Yuki Oto, Yumi Iwashita.

**Figure 1.** Figure 1: An example map of MPO Dataset with six place categories: (1) forest, (2) coast, (3) indoor parking lot, (4) outdoor parking lot, (5) residential area and (6) urban area Internet using online search engines for each object category term. SUN database [5] used similar procedures to create place databases. The scene database contains 899 categories with 130,519 images of scenes and numerous state-of-the-art a… view at source ↗

**Figure 2.** Figure 2: Experimental setup for Dense MPO Dataset equipped with (1) a FARO Focus3D sensor system and for Sparse MPO Dataset equipped with (2) a Velodyne HDL-32E laser scanner, (3) a Kodak PIXPRO SP360 camera and (4) a GARMIN GPS 18x LVC TABLE I DENSE MPO DATASET OF OUTDOOR SCENE CONTAINING 650 PAIRS OF RANGE, REFLECTANCE AND COLOR IMAGES Category Number of scans by location Total Set1 Set2 Set3 Set4 Set5 Set6 Set7 … view at source ↗

**Figure 3.** Figure 3: Dense MPO Dataset: examples of high-resolution range, reflectance and color panoramic images for six outdoor place categories: forest, coast, indoor/outdoor parking lot, residential and urban area. In range images, darker colors indicate closer distances and in reflectance images, brighter colors indicate higher intensity. 2) Data format: For each scan we provide 4 different multi-modal information: color… view at source ↗

**Figure 4.** Figure 4: Sparse MPO Dataset: examples of low-resolution range and reflectance panoramic images for six outdoor place categories: ‘forest’, ‘coast’, ‘indoor parking lot’, ‘outdoor parking lot’, ‘residential area’ and ‘urban area’. In range images, darker colors indicate closer distances and in reflectance images, brighter colors indicate higher intensity. IV. PLACE CATEGORIZATION In our previous research for indoor… view at source ↗

**Figure 5.** Figure 5: A performance of Sparse MPO Dataset by applying a majority vote technique TABLE V CORRECT CLASSIFICATION RATIO (CCR) RESULTS OF MULTI-MODALITIES USING DENSE MPO DATASET Modality Range + Reflectance Descriptor LBP [10] LTP [12] CCR[%] 95.67±3.69 92.84±3.33 TABLE VI CORRECT CLASSIFICATION RATIO (CCR) RESULTS OF STANDARD DESCRIPTORS AND MAJORITY VOTE USING SPARSE MPO DATASET [%] Descriptor Technique None Majo… view at source ↗

read the original abstract

We present two multi-modal panoramic 3D outdoor (MPO) datasets for semantic place categorization with six categories: forest, coast, residential area, urban area and indoor/outdoor parking lot. The first dataset consists of 650 static panoramic scans of dense (9,000,000 points) 3D color and reflectance point clouds obtained using a FARO laser scanner with synchronized color images. The second dataset consists of 34,200 real-time panoramic scans of sparse (70,000 points) 3D reflectance point clouds obtained using a Velodyne laser scanner while driving a car. The datasets were obtained in the city of Fukuoka, Japan and are publicly available in [1], [2]. In addition, we compare several approaches for semantic place categorization with best results of 96.42% (dense) and 89.67% (sparse).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper's main value is releasing two new public multi-modal panoramic 3D outdoor datasets for place categorization, one dense and one sparse, with baseline results.

read the letter

The paper's main contribution is the public release of two multi-modal panoramic 3D outdoor datasets for semantic place categorization. The first is 650 static dense scans using a FARO laser scanner, each with about 9 million points including color and reflectance. The second is 34,200 dynamic sparse scans from a Velodyne scanner on a moving car, with 70,000 points each in reflectance. Both were collected in Fukuoka, Japan, and cover six categories: forest, coast, residential area, urban area, and indoor/outdoor parking lots. They also provide baseline results for categorization, with the best being 96.42 percent on the dense dataset and 89.67 percent on the sparse one. This work does a good job of documenting the collection process and making the data available. Having both dense static and sparse mobile versions is practical for different use cases in outdoor robotics. The numbers on point counts and scan quantities give a clear picture of the scale. One soft spot is that the abstract only sketches the classification approaches without much on the specific techniques, validation methods, or potential issues like class imbalance or location-specific biases. That makes it harder to fully assess how generalizable the high accuracies are. The categories are quite distinct, which might make the task less challenging than real-world mixed scenes. This paper is for researchers in robotics and computer vision who focus on place recognition or semantic mapping in outdoor environments. It would be particularly useful for those looking for 3D data to train or evaluate models on panoramic scans. I think it deserves serious peer review. The core is the dataset release, and if the full paper includes proper documentation and reproducible baselines, it can be a solid resource for the community.

Referee Report

1 major / 2 minor

Summary. The paper presents two multi-modal panoramic 3D outdoor (MPO) datasets for semantic place categorization into six categories (forest, coast, residential area, urban area, indoor/outdoor parking lot). The dense dataset comprises 650 static scans (~9M points each) captured with a FARO scanner including synchronized color images; the sparse dataset comprises 34,200 dynamic scans (~70k points each) captured with a Velodyne scanner during driving. Both were collected in Fukuoka, Japan, are made publicly available, and the manuscript supplies baseline categorization results reaching 96.42% (dense) and 89.67% (sparse).

Significance. The public release of paired dense-static and sparse-dynamic multi-modal outdoor 3D scans fills a practical gap for place-categorization research in robotics. The scale (650 + 34k scans) and the explicit provision of both reflectance and color modalities enable direct comparison of algorithms across density regimes. When the baselines are reproducible, the datasets become a concrete benchmark resource rather than an unverified archive.

major comments (1)

[Experimental results / baseline comparison] The experimental results section reports concrete peak accuracies (96.42% dense, 89.67% sparse) but supplies no description of the feature representations, classifiers, train/test partitioning, or cross-validation procedure used to obtain them. Because these numbers are offered as evidence of the datasets' utility for place categorization, the absence of the evaluation protocol is load-bearing for the empirical claim.

minor comments (2)

[Abstract] The abstract states 'six categories' yet enumerates only five items (forest, coast, residential area, urban area, and indoor/outdoor parking lot). Clarify whether indoor and outdoor parking are treated as distinct classes or whether the list is incomplete.
[Dataset description] The public availability statements cite [1] and [2] but do not include DOIs, repository URLs, or license information in the main text; add these to the dataset-description section for immediate accessibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the datasets' significance and the recommendation for minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Experimental results / baseline comparison] The experimental results section reports concrete peak accuracies (96.42% dense, 89.67% sparse) but supplies no description of the feature representations, classifiers, train/test partitioning, or cross-validation procedure used to obtain them. Because these numbers are offered as evidence of the datasets' utility for place categorization, the absence of the evaluation protocol is load-bearing for the empirical claim.

Authors: We agree that the experimental protocol was not described in sufficient detail. In the revised manuscript we will expand the relevant section to specify the feature representations, the classifiers evaluated, the train/test partitioning (including any scene-level separation to avoid leakage), and the cross-validation procedure that produced the reported peak accuracies. These additions will render the baselines reproducible and directly support the claim of dataset utility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical dataset release with baselines

full rationale

The paper presents two MPO datasets (dense FARO and sparse Velodyne scans) collected in Fukuoka along with empirical baseline accuracies for six place categories. No derivation chain, equations, fitted parameters, or uniqueness theorems are invoked. Reported results (96.42% dense, 89.67% sparse) are direct measurements on the released data rather than predictions that reduce to inputs by construction. The work is archival and empirical; the central claim is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a dataset paper, there are no free parameters fitted, no additional axioms beyond standard assumptions in data collection, and no invented entities postulated.

pith-pipeline@v0.9.0 · 5463 in / 1018 out tokens · 44520 ms · 2026-05-10T14:45:40.198730+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Conceptual spatial representations for indoor mobile robots,

H. Zender, O. M. Mozos, P. Jensfelt, G.-J. M. Kruijff, and W. Bur- gard, “Conceptual spatial representations for indoor mobile robots,” Robotics and Autonomous Systems, vol. 56, pp. 493–502, June 2008

work page 2008
[2]

Large-scale semantic mapping and reasoning with heterogeneous modalities,

A. Pronobis and P. Jensfelt, “Large-scale semantic mapping and reasoning with heterogeneous modalities,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), (Saint Paul, MN, USA), May 2012

work page 2012
[3]

Efficient exploration of unknown indoor environments using a team of mobile robots,

C. Stachniss, O. M. Mozos, and W. Burgard, “Efficient exploration of unknown indoor environments using a team of mobile robots,”Annals of Mathematics and Artificial Intelligence, vol. 52, pp. 205–227, April 2008

work page 2008
[4]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inComputer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255, IEEE, 2009

work page 2009
[5]

Sun database: Large-scale scene recognition from abbey to zoo,

J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba,et al., “Sun database: Large-scale scene recognition from abbey to zoo,” inCom- puter vision and pattern recognition (CVPR), 2010 IEEE conference on, pp. 3485–3492, IEEE, 2010

work page 2010
[6]

Indoor Seg- mentation and Support Inference from RGBD Images,

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Seg- mentation and Support Inference from RGBD Images,” inComputer Vision – ECCV 2012, pp. 746–760, Berlin, Heidelberg: Springer Berlin Heidelberg, Oct. 2012

work page 2012
[7]

SUN RGB-D: A RGB-D scene understanding benchmark suite,

S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–576, IEEE, 2015

work page 2015
[8]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” inComputer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3354– 3361, IEEE, 2012

work page 2012
[9]

Dense and sparse multi-modal panoramic 3d outdoor (mpo) datasets

“Dense and sparse multi-modal panoramic 3d outdoor (mpo) datasets.” are available athttp://robotics.ait.kyushu-u.ac.jp/ ˜kurazume/research-e.php?content=db-hidden

work page
[10]

Cate- gorization of indoor places using the kinect sensor,

O. M. Mozos, H. Mizutani, R. Kurazume, and T. Hasegawa, “Cate- gorization of indoor places using the kinect sensor,”Sensors, vol. 12, pp. 6695–6711, May 2012

work page 2012
[11]

Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders,

O. M. Mozos, H. Mizutani, H. Jung, R. Kurazume, and T. Hasegawa, “Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders,” Advanced Robotics, vol. 27, pp. 1455–1464, October 2013

work page 2013
[12]

Local n- ary patterns: a local multi-modal descriptor for place categorization,

H. Jung, O. M. Mozos, Y . Iwashita, and R. Kurazume, “Local n- ary patterns: a local multi-modal descriptor for place categorization,” Advanced Robotics, pp. 1–14, 2016

work page 2016
[13]

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,

T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 971–987, jul 2002

work page 2002
[14]

Surface matching for object recognition in complex three-dimensional scenes,

A. E. Johnson and M. Hebert, “Surface matching for object recognition in complex three-dimensional scenes,”Image and Vision Computing, vol. 16, no. 9, pp. 635–651, 1998

work page 1998
[15]

Representing and recognizing the visual ap- pearance of materials using three-dimensional textons,

T. Leung and J. Malik, “Representing and recognizing the visual ap- pearance of materials using three-dimensional textons,”Int. J. Comput. Vision, vol. 43, pp. 29–44, June 2001

work page 2001
[16]

Support-vector network,

C. Cortes and V . Vapnik, “Support-vector network,”Machine Learn- ing, vol. 20, pp. 273–297, 1995

work page 1995
[17]

C. M. Bishop,Pattern Recognition and Machine Learning. Springer, 2006

work page 2006
[18]

Single-layer learning revis- ited: a stepwise procedure for building and training a neural network,

S. Knerr, L. Personnaz, and G. Dreyfus, “Single-layer learning revis- ited: a stepwise procedure for building and training a neural network,” inNeurocomputing: Algorithms, Architectures and Applications(J. Fo- gelman, ed.), Springer-Verlag, 1990

work page 1990
[19]

LIBSVM: A library for support vector machines,

C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,”ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011. Software available athttp://www. csie.ntu.edu.tw/˜cjlin/libsvm

work page 2011
[20]

A practical guide to support vector classification

C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification.”http://www.csie.ntu.edu.tw/ ˜cjlin/papers/guide/guide.pdf, 2010

work page 2010
[21]

R. S. Boyer and J. S. Moore,MJRTY—a fast majority vote algorithm. Springer, 1991

work page 1991

[1] [1]

Conceptual spatial representations for indoor mobile robots,

H. Zender, O. M. Mozos, P. Jensfelt, G.-J. M. Kruijff, and W. Bur- gard, “Conceptual spatial representations for indoor mobile robots,” Robotics and Autonomous Systems, vol. 56, pp. 493–502, June 2008

work page 2008

[2] [2]

Large-scale semantic mapping and reasoning with heterogeneous modalities,

A. Pronobis and P. Jensfelt, “Large-scale semantic mapping and reasoning with heterogeneous modalities,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), (Saint Paul, MN, USA), May 2012

work page 2012

[3] [3]

Efficient exploration of unknown indoor environments using a team of mobile robots,

C. Stachniss, O. M. Mozos, and W. Burgard, “Efficient exploration of unknown indoor environments using a team of mobile robots,”Annals of Mathematics and Artificial Intelligence, vol. 52, pp. 205–227, April 2008

work page 2008

[4] [4]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inComputer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255, IEEE, 2009

work page 2009

[5] [5]

Sun database: Large-scale scene recognition from abbey to zoo,

J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba,et al., “Sun database: Large-scale scene recognition from abbey to zoo,” inCom- puter vision and pattern recognition (CVPR), 2010 IEEE conference on, pp. 3485–3492, IEEE, 2010

work page 2010

[6] [6]

Indoor Seg- mentation and Support Inference from RGBD Images,

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Seg- mentation and Support Inference from RGBD Images,” inComputer Vision – ECCV 2012, pp. 746–760, Berlin, Heidelberg: Springer Berlin Heidelberg, Oct. 2012

work page 2012

[7] [7]

SUN RGB-D: A RGB-D scene understanding benchmark suite,

S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–576, IEEE, 2015

work page 2015

[8] [8]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” inComputer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3354– 3361, IEEE, 2012

work page 2012

[9] [9]

Dense and sparse multi-modal panoramic 3d outdoor (mpo) datasets

“Dense and sparse multi-modal panoramic 3d outdoor (mpo) datasets.” are available athttp://robotics.ait.kyushu-u.ac.jp/ ˜kurazume/research-e.php?content=db-hidden

work page

[10] [10]

Cate- gorization of indoor places using the kinect sensor,

O. M. Mozos, H. Mizutani, R. Kurazume, and T. Hasegawa, “Cate- gorization of indoor places using the kinect sensor,”Sensors, vol. 12, pp. 6695–6711, May 2012

work page 2012

[11] [11]

Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders,

O. M. Mozos, H. Mizutani, H. Jung, R. Kurazume, and T. Hasegawa, “Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders,” Advanced Robotics, vol. 27, pp. 1455–1464, October 2013

work page 2013

[12] [12]

Local n- ary patterns: a local multi-modal descriptor for place categorization,

H. Jung, O. M. Mozos, Y . Iwashita, and R. Kurazume, “Local n- ary patterns: a local multi-modal descriptor for place categorization,” Advanced Robotics, pp. 1–14, 2016

work page 2016

[13] [13]

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,

T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 971–987, jul 2002

work page 2002

[14] [14]

Surface matching for object recognition in complex three-dimensional scenes,

A. E. Johnson and M. Hebert, “Surface matching for object recognition in complex three-dimensional scenes,”Image and Vision Computing, vol. 16, no. 9, pp. 635–651, 1998

work page 1998

[15] [15]

Representing and recognizing the visual ap- pearance of materials using three-dimensional textons,

T. Leung and J. Malik, “Representing and recognizing the visual ap- pearance of materials using three-dimensional textons,”Int. J. Comput. Vision, vol. 43, pp. 29–44, June 2001

work page 2001

[16] [16]

Support-vector network,

C. Cortes and V . Vapnik, “Support-vector network,”Machine Learn- ing, vol. 20, pp. 273–297, 1995

work page 1995

[17] [17]

C. M. Bishop,Pattern Recognition and Machine Learning. Springer, 2006

work page 2006

[18] [18]

Single-layer learning revis- ited: a stepwise procedure for building and training a neural network,

S. Knerr, L. Personnaz, and G. Dreyfus, “Single-layer learning revis- ited: a stepwise procedure for building and training a neural network,” inNeurocomputing: Algorithms, Architectures and Applications(J. Fo- gelman, ed.), Springer-Verlag, 1990

work page 1990

[19] [19]

LIBSVM: A library for support vector machines,

C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,”ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011. Software available athttp://www. csie.ntu.edu.tw/˜cjlin/libsvm

work page 2011

[20] [20]

A practical guide to support vector classification

C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification.”http://www.csie.ntu.edu.tw/ ˜cjlin/papers/guide/guide.pdf, 2010

work page 2010

[21] [21]

R. S. Boyer and J. S. Moore,MJRTY—a fast majority vote algorithm. Springer, 1991

work page 1991