pith. sign in

arxiv: 1907.04758 · v1 · pith:TEHK4BKUnew · submitted 2019-07-10 · 💻 cs.CV

SynthCity: A large scale synthetic point cloud

Pith reviewed 2026-05-24 23:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords synthetic point cloudmobile laser scanninglabeled dataseturban environmentpoint cloud classificationBlender simulationsemantic segmentationdeep learning data
0
0 comments X

The pith

SynthCity supplies a 367.9 million point synthetic urban MLS point cloud with nine-class labels to support deep learning research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates and releases SynthCity as an open dataset to tackle the limited supply of labeled three-dimensional point cloud data for training classification models. It generates a full-color Mobile Laser Scanning point cloud of a typical urban and suburban scene, with every point carrying one of nine semantic labels. The cloud is produced inside Blender using the Blensor plugin. A reader would care because collecting and labeling real MLS data at this scale is costly, so a usable synthetic substitute could reduce that barrier if models trained on it perform on real scans.

Core claim

SynthCity is an open dataset consisting of a 367.9M point synthetic full-colour Mobile Laser Scanning point cloud of an urban/suburban environment, with every point assigned a label from one of nine categories and generated using the Blensor plugin for Blender.

What carries the argument

SynthCity dataset: the 367.9 million point synthetic MLS point cloud with nine-class semantic labels produced by Blensor simulation inside Blender.

If this is right

  • The dataset removes the need to collect and label millions of real points before training point-cloud classifiers.
  • It allows direct experiments on whether synthetic pre-training improves final performance on real MLS scenes.
  • Nine-class labels enable supervised learning for standard urban semantic segmentation tasks without additional annotation effort.
  • The open release lets multiple groups compare different network architectures on the same large synthetic baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If transfer works, hybrid pipelines that mix SynthCity with smaller real datasets could become standard practice.
  • The same Blender setup could be varied to produce matched pairs of synthetic and real scans for domain-adaptation studies.
  • Success here would motivate similar synthetic generation for other sensors such as terrestrial or airborne laser scanning.

Load-bearing premise

That models trained or pre-trained on this Blender-generated synthetic point cloud will transfer usefully to real-world Mobile Laser Scanning data.

What would settle it

Train a point-cloud segmentation network on SynthCity alone and measure its accuracy on a held-out real MLS urban scan; high accuracy supports the claim while a large drop refutes it.

Figures

Figures reproduced from arXiv: 1907.04758 by David Griffiths, Jan Boehm.

Figure 1
Figure 1. Figure 1: Example of the SynthCity dataset displaying a) class la [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rendered image from the initial downloaded model. Im [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Total point counts for each label category. Note the log [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist which, if proved effective can be exploited. We therefore argue that research in this domain would be of significant use. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents SynthCity, a 367.9 million point synthetic full-color Mobile Laser Scanning point cloud dataset generated in a typical urban/suburban environment using the Blensor plugin for Blender. Every point is labeled with one of nine categories. The work is positioned as an open resource to support research on deep learning for point cloud classification, particularly the use of synthetic data for pre-training, while noting that generalization from synthetic to real data remains poorly studied.

Significance. A large-scale, publicly available labeled synthetic MLS point cloud could help address data scarcity in 3D computer vision. Because the manuscript supplies only the generation pipeline and dataset description with no experiments, similarity metrics, or generalization results, any significance is potential and will depend on adoption and validation by subsequent users.

major comments (1)
  1. [Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.
minor comments (1)
  1. [Dataset generation] The generation pipeline description would benefit from explicit listing of the nine label categories and any post-processing steps applied to the raw Blensor output.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract and Introduction] Abstract and Introduction: the central contribution is framed as aiding research on synthetic-to-real transfer, yet the manuscript contains no quantitative comparison of the synthetic data to real MLS distributions (point density, noise, label fidelity) or any baseline training results; this absence is load-bearing for assessing whether the released resource actually advances the stated research goal.

    Authors: We agree that the absence of quantitative comparisons and baseline results limits the strength of the claim that the dataset advances synthetic-to-real transfer research. The manuscript's core contribution is the release of a large-scale, publicly available labeled synthetic MLS point cloud generated via Blensor, which did not previously exist at this scale. To better support the stated goal, the revised manuscript will incorporate additional quantitative statistics on point density and per-category distributions, drawing comparisons to available real-world MLS datasets where feasible. Full noise characterization, label fidelity metrics, and end-to-end transfer learning baselines would require substantial new experiments outside the scope of a dataset description paper and are better suited to follow-up work that leverages the released resource. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a pure data-release paper whose sole contribution is the generation and public distribution of a 367.9 M point labeled synthetic MLS dataset produced with Blensor/Blender. The abstract and text contain no equations, no fitted parameters, no predictions, and no load-bearing self-citations. The authors explicitly note that synthetic-to-real generalization remains poorly studied and do not assert any performance claim that would require circular justification. Consequently no derivation chain exists that could reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper; no mathematical derivations, fitted parameters, axioms, or new postulated entities are introduced.

pith-pipeline@v0.9.0 · 5675 in / 1066 out tokens · 22864 ms · 2026-05-24T23:47:45.720635+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Identifying Ethical Biases in Action Recognition Models

    cs.CV 2026-04 unverdicted novelty 6.0

    The authors create a synthetic video auditing framework that detects statistically significant skin color biases in popular human action recognition models even when actions are identical.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. IEEE, 2009, pp. 248–255

  2. [2]

    Microsoft COCO: Common Objects in Context,

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuyte- laars, Eds. Springer International Publishing, 2014, pp. 740–755

  3. [3]

    The Pascal Visual Ob- ject Classes (VOC) Challenge,

    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Ob- ject Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010

  4. [4]

    ShapeNet: An Information-Rich 3D Model Repository

    A. X. Chang, T. Funkhouser, L. Guibas, P. Han- rahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Reposi- tory,”arXiv:1512.03012 [cs], Dec. 2015

  5. [5]

    Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classification methods,

    A. Serna, B. Marcotegui, F. Goulette, and J.-E. De- schaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detec- tion, segmentation and classification methods,” in 4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014 , Angers, France, Mar. 2014

  6. [6]

    TerraMobilita/iQmulus urban point cloud analysis benchmark,

    B. Vallet, M. Br ´edif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics , vol. 49, pp. 126–133, Jun. 2015

  7. [7]

    Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification

    X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris- Lille-3D: A large and high-quality ground truth ur- ban point cloud dataset for automatic segmentation and classification,”arXiv:1712.00032 [cs, stat], Nov. 2017. 5

  8. [8]

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classification Bench- mark,” arXiv:1704.03847 [cs], Apr. 2017

  9. [9]

    Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,

    D. Griffiths and J. Boehm, “Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial In- formation Sciences , vol. XLII-2/W13, pp. 981–987, Jun. 2019

  10. [10]

    A Review on Deep Learning Techniques for 3D Sensed Data Classification,

    ——, “A Review on Deep Learning Techniques for 3D Sensed Data Classification,” Remote Sensing , vol. 11, no. 12, p. 1499, Jan. 2019

  11. [11]

    SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

    B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeeze- Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D Li- DAR Point Cloud,”arXiv:1710.07368 [cs], Oct. 2017

  12. [12]

    SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud

    B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud,” arXiv:1809.08495 [cs], Sep. 2018

  13. [13]

    The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,

    G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Col- lection of Synthetic Images for Semantic Segmenta- tion of Urban Scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, 2016, pp. 3234–3243

  14. [14]

    CARLA: An Open Urban Driving Simulator

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An Open Urban Driving Simu- lator,” arXiv:1711.03938 [cs], Nov. 2017

  15. [15]

    Blender - a 3D modelling and ren- dering package,

    B. O. Community, “Blender - a 3D modelling and ren- dering package,” Blender Foundation, 2018

  16. [16]

    Metropolis City Experience,

    T. Squid, “Metropolis City Experience,” https://www.turbosquid.com/3d-models/city- modular-new-max/982288, accessed: 2019-07-10

  17. [17]

    BlenSor: Blender Sensor Simulation Toolbox,

    M. Gschwandtner, R. Kwitt, A. Uhl, and W. Pree, “BlenSor: Blender Sensor Simulation Toolbox,” in Advances in Visual Computing , ser. Lecture Notes in Computer Science, G. Bebis, R. Boyle, B. Parvin, D. Koracin, S. Wang, K. Kyungnam, B. Benes, K. Moreland, C. Borst, S. DiVerdi, C. Yi-Jen, and J. Ming, Eds. Springer Berlin Heidelberg, 2011, pp. 199–208

  18. [18]

    Apache Parquet,

    “Apache Parquet,” https://parquet.apache.org/, ac- cessed: 2019-07-10. 6