SynthCity: A large scale synthetic point cloud
Pith reviewed 2026-05-24 23:47 UTC · model grok-4.3
The pith
SynthCity supplies a 367.9 million point synthetic urban MLS point cloud with nine-class labels to support deep learning research.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SynthCity is an open dataset consisting of a 367.9M point synthetic full-colour Mobile Laser Scanning point cloud of an urban/suburban environment, with every point assigned a label from one of nine categories and generated using the Blensor plugin for Blender.
What carries the argument
SynthCity dataset: the 367.9 million point synthetic MLS point cloud with nine-class semantic labels produced by Blensor simulation inside Blender.
If this is right
- The dataset removes the need to collect and label millions of real points before training point-cloud classifiers.
- It allows direct experiments on whether synthetic pre-training improves final performance on real MLS scenes.
- Nine-class labels enable supervised learning for standard urban semantic segmentation tasks without additional annotation effort.
- The open release lets multiple groups compare different network architectures on the same large synthetic baseline.
Where Pith is reading between the lines
- If transfer works, hybrid pipelines that mix SynthCity with smaller real datasets could become standard practice.
- The same Blender setup could be varied to produce matched pairs of synthetic and real scans for domain-adaptation studies.
- Success here would motivate similar synthetic generation for other sensors such as terrestrial or airborne laser scanning.
Load-bearing premise
That models trained or pre-trained on this Blender-generated synthetic point cloud will transfer usefully to real-world Mobile Laser Scanning data.
What would settle it
Train a point-cloud segmentation network on SynthCity alone and measure its accuracy on a held-out real MLS urban scan; high accuracy supports the claim while a large drop refutes it.
Figures
read the original abstract
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist which, if proved effective can be exploited. We therefore argue that research in this domain would be of significant use. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SynthCity, a 367.9 million point synthetic full-color Mobile Laser Scanning point cloud dataset generated in a typical urban/suburban environment using the Blensor plugin for Blender. Every point is labeled with one of nine categories. The work is positioned as an open resource to support research on deep learning for point cloud classification, particularly the use of synthetic data for pre-training, while noting that generalization from synthetic to real data remains poorly studied.
Significance. A large-scale, publicly available labeled synthetic MLS point cloud could help address data scarcity in 3D computer vision. Because the manuscript supplies only the generation pipeline and dataset description with no experiments, similarity metrics, or generalization results, any significance is potential and will depend on adoption and validation by subsequent users.
major comments (1)
- [Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.
minor comments (1)
- [Dataset generation] The generation pipeline description would benefit from explicit listing of the nine label categories and any post-processing steps applied to the raw Blensor output.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.
Authors: We agree that the absence of quantitative comparisons and baseline results limits the strength of the claim that the dataset advances synthetic-to-real transfer research. The manuscript's core contribution is the release of a large-scale, publicly available labeled synthetic MLS point cloud generated via Blensor, which did not previously exist at this scale. To better support the stated goal, the revised manuscript will incorporate additional quantitative statistics on point density and per-category distributions, drawing comparisons to available real-world MLS datasets where feasible. Full noise characterization, label fidelity metrics, and end-to-end transfer learning baselines would require substantial new experiments outside the scope of a dataset description paper and are better suited to follow-up work that leverages the released resource. revision: partial
Circularity Check
No significant circularity
full rationale
This is a pure data-release paper whose sole contribution is the generation and public distribution of a 367.9 M point labeled synthetic MLS dataset produced with Blensor/Blender. The abstract and text contain no equations, no fitted parameters, no predictions, and no load-bearing self-citations. The authors explicitly note that synthetic-to-real generalization remains poorly studied and do not assert any performance claim that would require circular justification. Consequently no derivation chain exists that could reduce to its own inputs.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Identifying Ethical Biases in Action Recognition Models
The authors create a synthetic video auditing framework that detects statistically significant skin color biases in popular human action recognition models even when actions are identical.
Reference graph
Works this paper leans on
-
[1]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. IEEE, 2009, pp. 248–255
work page 2009
-
[2]
Microsoft COCO: Common Objects in Context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuyte- laars, Eds. Springer International Publishing, 2014, pp. 740–755
work page 2014
-
[3]
The Pascal Visual Ob- ject Classes (VOC) Challenge,
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Ob- ject Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010
work page 2010
-
[4]
ShapeNet: An Information-Rich 3D Model Repository
A. X. Chang, T. Funkhouser, L. Guibas, P. Han- rahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Reposi- tory,”arXiv:1512.03012 [cs], Dec. 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[5]
A. Serna, B. Marcotegui, F. Goulette, and J.-E. De- schaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classification methods,” in 4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014 , Angers, France, Mar. 2014
work page 2014
-
[6]
TerraMobilita/iQmulus urban point cloud analysis benchmark,
B. Vallet, M. Br ´edif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics , vol. 49, pp. 126–133, Jun. 2015
work page 2015
-
[7]
X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris- Lille-3D: A large and high-quality ground truth ur- ban point cloud dataset for automatic segmentation and classification,”arXiv:1712.00032 [cs, stat], Nov. 2017. 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[8]
Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark
T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classification Bench- mark,” arXiv:1704.03847 [cs], Apr. 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,
D. Griffiths and J. Boehm, “Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial In- formation Sciences , vol. XLII-2/W13, pp. 981–987, Jun. 2019
work page 2019
-
[10]
A Review on Deep Learning Techniques for 3D Sensed Data Classification,
——, “A Review on Deep Learning Techniques for 3D Sensed Data Classification,” Remote Sensing , vol. 11, no. 12, p. 1499, Jan. 2019
work page 2019
-
[11]
B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeeze- Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D Li- DAR Point Cloud,”arXiv:1710.07368 [cs], Oct. 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud,” arXiv:1809.08495 [cs], Sep. 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2016, pp. 3234–3243
work page 2016
-
[14]
CARLA: An Open Urban Driving Simulator
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simu- lator,” arXiv:1711.03938 [cs], Nov. 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Blender - a 3D modelling and ren- dering package,
B. O. Community, “Blender - a 3D modelling and ren- dering package,” Blender Foundation, 2018
work page 2018
-
[16]
T. Squid, “Metropolis City Experience,” https://www.turbosquid.com/3d-models/city- modular-new-max/982288, accessed: 2019-07-10
work page 2019
-
[17]
BlenSor: Blender Sensor Simulation Toolbox,
M. Gschwandtner, R. Kwitt, A. Uhl, and W. Pree, “BlenSor: Blender Sensor Simulation Toolbox,” in Advances in Visual Computing , ser. Lecture Notes in Computer Science, G. Bebis, R. Boyle, B. Parvin, D. Koracin, S. Wang, K. Kyungnam, B. Benes, K. Moreland, C. Borst, S. DiVerdi, C. Yi-Jen, and J. Ming, Eds. Springer Berlin Heidelberg, 2011, pp. 199–208
work page 2011
-
[18]
“Apache Parquet,” https://parquet.apache.org/, ac- cessed: 2019-07-10. 6
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.