SynthCity: A large scale synthetic point cloud

David Griffiths; Jan Boehm

arxiv: 1907.04758 · v1 · pith:TEHK4BKUnew · submitted 2019-07-10 · 💻 cs.CV

SynthCity: A large scale synthetic point cloud

David Griffiths , Jan Boehm This is my paper

Pith reviewed 2026-05-24 23:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords synthetic point cloudmobile laser scanninglabeled dataseturban environmentpoint cloud classificationBlender simulationsemantic segmentationdeep learning data

0 comments

The pith

SynthCity supplies a 367.9 million point synthetic urban MLS point cloud with nine-class labels to support deep learning research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates and releases SynthCity as an open dataset to tackle the limited supply of labeled three-dimensional point cloud data for training classification models. It generates a full-color Mobile Laser Scanning point cloud of a typical urban and suburban scene, with every point carrying one of nine semantic labels. The cloud is produced inside Blender using the Blensor plugin. A reader would care because collecting and labeling real MLS data at this scale is costly, so a usable synthetic substitute could reduce that barrier if models trained on it perform on real scans.

Core claim

SynthCity is an open dataset consisting of a 367.9M point synthetic full-colour Mobile Laser Scanning point cloud of an urban/suburban environment, with every point assigned a label from one of nine categories and generated using the Blensor plugin for Blender.

What carries the argument

SynthCity dataset: the 367.9 million point synthetic MLS point cloud with nine-class semantic labels produced by Blensor simulation inside Blender.

If this is right

The dataset removes the need to collect and label millions of real points before training point-cloud classifiers.
It allows direct experiments on whether synthetic pre-training improves final performance on real MLS scenes.
Nine-class labels enable supervised learning for standard urban semantic segmentation tasks without additional annotation effort.
The open release lets multiple groups compare different network architectures on the same large synthetic baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If transfer works, hybrid pipelines that mix SynthCity with smaller real datasets could become standard practice.
The same Blender setup could be varied to produce matched pairs of synthetic and real scans for domain-adaptation studies.
Success here would motivate similar synthetic generation for other sensors such as terrestrial or airborne laser scanning.

Load-bearing premise

That models trained or pre-trained on this Blender-generated synthetic point cloud will transfer usefully to real-world Mobile Laser Scanning data.

What would settle it

Train a point-cloud segmentation network on SynthCity alone and measure its accuracy on a held-out real MLS urban scan; high accuracy supports the claim while a large drop refutes it.

Figures

Figures reproduced from arXiv: 1907.04758 by David Griffiths, Jan Boehm.

**Figure 2.** Figure 2: Rendered image from the initial downloaded model. Im [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Total point counts for each label category. Note the log [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist which, if proved effective can be exploited. We therefore argue that research in this domain would be of significant use. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents SynthCity, a 367.9 million point synthetic full-color Mobile Laser Scanning point cloud dataset generated in a typical urban/suburban environment using the Blensor plugin for Blender. Every point is labeled with one of nine categories. The work is positioned as an open resource to support research on deep learning for point cloud classification, particularly the use of synthetic data for pre-training, while noting that generalization from synthetic to real data remains poorly studied.

Significance. A large-scale, publicly available labeled synthetic MLS point cloud could help address data scarcity in 3D computer vision. Because the manuscript supplies only the generation pipeline and dataset description with no experiments, similarity metrics, or generalization results, any significance is potential and will depend on adoption and validation by subsequent users.

major comments (1)

[Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.

minor comments (1)

[Dataset generation] The generation pipeline description would benefit from explicit listing of the nine label categories and any post-processing steps applied to the raw Blensor output.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.

Authors: We agree that the absence of quantitative comparisons and baseline results limits the strength of the claim that the dataset advances synthetic-to-real transfer research. The manuscript's core contribution is the release of a large-scale, publicly available labeled synthetic MLS point cloud generated via Blensor, which did not previously exist at this scale. To better support the stated goal, the revised manuscript will incorporate additional quantitative statistics on point density and per-category distributions, drawing comparisons to available real-world MLS datasets where feasible. Full noise characterization, label fidelity metrics, and end-to-end transfer learning baselines would require substantial new experiments outside the scope of a dataset description paper and are better suited to follow-up work that leverages the released resource. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a pure data-release paper whose sole contribution is the generation and public distribution of a 367.9 M point labeled synthetic MLS dataset produced with Blensor/Blender. The abstract and text contain no equations, no fitted parameters, no predictions, and no load-bearing self-citations. The authors explicitly note that synthetic-to-real generalization remains poorly studied and do not assert any performance claim that would require circular justification. Consequently no derivation chain exists that could reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper; no mathematical derivations, fitted parameters, axioms, or new postulated entities are introduced.

pith-pipeline@v0.9.0 · 5675 in / 1066 out tokens · 22864 ms · 2026-05-24T23:47:45.720635+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Identifying Ethical Biases in Action Recognition Models
cs.CV 2026-04 unverdicted novelty 6.0

The authors create a synthetic video auditing framework that detects statistically significant skin color biases in popular human action recognition models even when actions are identical.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. IEEE, 2009, pp. 248–255

work page 2009
[2]

Microsoft COCO: Common Objects in Context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuyte- laars, Eds. Springer International Publishing, 2014, pp. 740–755

work page 2014
[3]

The Pascal Visual Ob- ject Classes (VOC) Challenge,

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Ob- ject Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010

work page 2010
[4]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P. Han- rahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Reposi- tory,”arXiv:1512.03012 [cs], Dec. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classiﬁcation methods,

A. Serna, B. Marcotegui, F. Goulette, and J.-E. De- schaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classiﬁcation methods,” in 4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014 , Angers, France, Mar. 2014

work page 2014
[6]

TerraMobilita/iQmulus urban point cloud analysis benchmark,

B. Vallet, M. Br ´edif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics , vol. 49, pp. 126–133, Jun. 2015

work page 2015
[7]

Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification

X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris- Lille-3D: A large and high-quality ground truth ur- ban point cloud dataset for automatic segmentation and classiﬁcation,”arXiv:1712.00032 [cs, stat], Nov. 2017. 5

work page internal anchor Pith review Pith/arXiv arXiv 2017
[8]

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classiﬁcation Bench- mark,” arXiv:1704.03847 [cs], Apr. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,

D. Grifﬁths and J. Boehm, “Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial In- formation Sciences , vol. XLII-2/W13, pp. 981–987, Jun. 2019

work page 2019
[10]

A Review on Deep Learning Techniques for 3D Sensed Data Classiﬁcation,

——, “A Review on Deep Learning Techniques for 3D Sensed Data Classiﬁcation,” Remote Sensing , vol. 11, no. 12, p. 1499, Jan. 2019

work page 2019
[11]

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeeze- Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D Li- DAR Point Cloud,”arXiv:1710.07368 [cs], Oct. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud

B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud,” arXiv:1809.08495 [cs], Sep. 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,

G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2016, pp. 3234–3243

work page 2016
[14]

CARLA: An Open Urban Driving Simulator

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simu- lator,” arXiv:1711.03938 [cs], Nov. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Blender - a 3D modelling and ren- dering package,

B. O. Community, “Blender - a 3D modelling and ren- dering package,” Blender Foundation, 2018

work page 2018
[16]

Metropolis City Experience,

T. Squid, “Metropolis City Experience,” https://www.turbosquid.com/3d-models/city- modular-new-max/982288, accessed: 2019-07-10

work page 2019
[17]

BlenSor: Blender Sensor Simulation Toolbox,

M. Gschwandtner, R. Kwitt, A. Uhl, and W. Pree, “BlenSor: Blender Sensor Simulation Toolbox,” in Advances in Visual Computing , ser. Lecture Notes in Computer Science, G. Bebis, R. Boyle, B. Parvin, D. Koracin, S. Wang, K. Kyungnam, B. Benes, K. Moreland, C. Borst, S. DiVerdi, C. Yi-Jen, and J. Ming, Eds. Springer Berlin Heidelberg, 2011, pp. 199–208

work page 2011
[18]

Apache Parquet,

“Apache Parquet,” https://parquet.apache.org/, ac- cessed: 2019-07-10. 6

work page 2019

[1] [1]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. IEEE, 2009, pp. 248–255

work page 2009

[2] [2]

Microsoft COCO: Common Objects in Context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuyte- laars, Eds. Springer International Publishing, 2014, pp. 740–755

work page 2014

[3] [3]

The Pascal Visual Ob- ject Classes (VOC) Challenge,

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Ob- ject Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010

work page 2010

[4] [4]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P. Han- rahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Reposi- tory,”arXiv:1512.03012 [cs], Dec. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classiﬁcation methods,

A. Serna, B. Marcotegui, F. Goulette, and J.-E. De- schaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classiﬁcation methods,” in 4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014 , Angers, France, Mar. 2014

work page 2014

[6] [6]

TerraMobilita/iQmulus urban point cloud analysis benchmark,

B. Vallet, M. Br ´edif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics , vol. 49, pp. 126–133, Jun. 2015

work page 2015

[7] [7]

Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification

X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris- Lille-3D: A large and high-quality ground truth ur- ban point cloud dataset for automatic segmentation and classiﬁcation,”arXiv:1712.00032 [cs, stat], Nov. 2017. 5

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [8]

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classiﬁcation Bench- mark,” arXiv:1704.03847 [cs], Apr. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,

D. Grifﬁths and J. Boehm, “Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial In- formation Sciences , vol. XLII-2/W13, pp. 981–987, Jun. 2019

work page 2019

[10] [10]

A Review on Deep Learning Techniques for 3D Sensed Data Classiﬁcation,

——, “A Review on Deep Learning Techniques for 3D Sensed Data Classiﬁcation,” Remote Sensing , vol. 11, no. 12, p. 1499, Jan. 2019

work page 2019

[11] [11]

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeeze- Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D Li- DAR Point Cloud,”arXiv:1710.07368 [cs], Oct. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud

B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud,” arXiv:1809.08495 [cs], Sep. 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,

G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2016, pp. 3234–3243

work page 2016

[14] [14]

CARLA: An Open Urban Driving Simulator

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simu- lator,” arXiv:1711.03938 [cs], Nov. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Blender - a 3D modelling and ren- dering package,

B. O. Community, “Blender - a 3D modelling and ren- dering package,” Blender Foundation, 2018

work page 2018

[16] [16]

Metropolis City Experience,

T. Squid, “Metropolis City Experience,” https://www.turbosquid.com/3d-models/city- modular-new-max/982288, accessed: 2019-07-10

work page 2019

[17] [17]

BlenSor: Blender Sensor Simulation Toolbox,

M. Gschwandtner, R. Kwitt, A. Uhl, and W. Pree, “BlenSor: Blender Sensor Simulation Toolbox,” in Advances in Visual Computing , ser. Lecture Notes in Computer Science, G. Bebis, R. Boyle, B. Parvin, D. Koracin, S. Wang, K. Kyungnam, B. Benes, K. Moreland, C. Borst, S. DiVerdi, C. Yi-Jen, and J. Ming, Eds. Springer Berlin Heidelberg, 2011, pp. 199–208

work page 2011

[18] [18]

Apache Parquet,

“Apache Parquet,” https://parquet.apache.org/, ac- cessed: 2019-07-10. 6

work page 2019