pith. sign in

arxiv: 1907.04569 · v1 · pith:7T5UI3JAnew · submitted 2019-07-10 · 💻 cs.RO · cs.CV

Generating All the Roads to Rome: Road Layout Randomization for Improved Road Marking Segmentation

Pith reviewed 2026-05-25 00:05 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords road marking segmentationdomain randomizationsynthetic data generationsemantic labelsurban environmentsdata augmentationautonomous driving
0
0 comments X

The pith

Randomizing road layouts in semantic labels generates synthetic pairs that raise rare road marking segmentation accuracy by over 12 percentage points on real urban data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates an approach to create additional training examples for road marking segmentation by randomly altering road layouts inside existing semantic labels and then synthesizing new images from those altered labels. This targets the expense of manual labeling and the limited variety found in purely virtual data. A sympathetic reader would care because more accurate detection of uncommon markings such as arrows or symbols can support safer operation of automated vehicles in varied city settings. The method augments real data so that gains appear on rare classes while results on common classes stay the same.

Core claim

By applying domain randomization to road layouts within semantic segmentation labels, new synthetic training pairs are generated that, when used to augment real data, improve mean intersection over union for rare road marking classes by over 12 percentage points in real-world complex urban settings, without degrading results on other classes. This method scales to produce large datasets across domains without additional manual labeling.

What carries the argument

Domain randomization applied to road layouts inside semantic labels to synthesize new training images.

If this is right

  • The same procedure can be applied to any domain or driving condition to create large-scale road marking datasets.
  • Performance on frequent road marking classes remains unchanged while rare classes improve.
  • Manual labeling effort for road markings can be reduced or eliminated for many training scenarios.
  • The generated pairs integrate directly into existing segmentation training pipelines for urban environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The randomization principle could transfer to other layout-sensitive segmentation tasks such as lane or curb detection.
  • Wider use might reduce the data gap between simulation and real deployment for perception modules in autonomous systems.
  • Repeated application across many base labels could systematically cover layout variations that are rare in collected data.

Load-bearing premise

Random alterations to road layouts must produce synthetic images close enough to real-world variations to deliver measurable gains without introducing artifacts that reduce overall performance.

What would settle it

Training a segmentation model on the generated pairs and then measuring mIoU on a held-out real-world urban test set; no gain or a drop for rare classes relative to a baseline trained only on the original data would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.04569 by Horia Porav, Lars Kunze, Paul Newman, Tom Bruls.

Figure 1
Figure 1. Figure 1: Road layout randomization for improved road marking segmen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: We augmented the Oxford Robotcar Dataset with semantic labels [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of newly-synthesized training images for several rare road marking classes (i.e. zigzag, diagonal stripes, bus stop, and small warning [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of the partial labels created by semantically classifying [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Road marking segmentation (full set of classes) in traffic environments with rare classes. The [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Road markings provide guidance to traffic participants and enforce safe driving behaviour, understanding their semantic meaning is therefore paramount in (automated) driving. However, producing the vast quantities of road marking labels required for training state-of-the-art deep networks is costly, time-consuming, and simply infeasible for every domain and condition. In addition, training data retrieved from virtual worlds often lack the richness and complexity of the real world and consequently cannot be used directly. In this paper, we provide an alternative approach in which new road marking training pairs are automatically generated. To this end, we apply principles of domain randomization to the road layout and synthesize new images from altered semantic labels. We demonstrate that training on these synthetic pairs improves mIoU of the segmentation of rare road marking classes during real-world deployment in complex urban environments by more than 12 percentage points, while performance for other classes is retained. This framework can easily be scaled to all domains and conditions to generate large-scale road marking datasets, while avoiding manual labelling effort.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes using domain randomization on road layouts within semantic labels, followed by image synthesis, to automatically generate new synthetic training pairs for road marking segmentation. It reports that models trained on these pairs achieve more than 12 percentage points higher mIoU on rare road marking classes when deployed on real-world complex urban data, while performance on other classes remains unchanged. The approach is positioned as a scalable method to create large datasets without manual labeling effort.

Significance. If the empirical transfer result holds under closer scrutiny, the work would offer a practical route to address data scarcity for rare classes in semantic segmentation for autonomous driving, reducing annotation costs while preserving overall model utility. The emphasis on retaining performance on common classes strengthens the practical value.

major comments (2)
  1. [Experimental results / evaluation] The central empirical claim (abstract) of a >12 pp mIoU gain on rare classes rests on the unverified assumption that layout-randomized synthetic images have visual statistics sufficiently close to real urban scenes. No quantitative checks (e.g., FID, perceptual metrics, or distribution distances) or ablations that isolate the randomization step are described, leaving open the possibility that the reported lift arises from synthesis artifacts rather than improved semantic generalization.
  2. [Abstract and experimental section] The abstract states the improvement occurs 'during real-world deployment in complex urban environments' but provides no details on the size or composition of the real test set, the exact rare classes involved, the baseline model and training protocol, or any statistical significance testing. These omissions make it impossible to judge whether the 12 pp figure is robust or load-bearing for the generalization claim.
minor comments (1)
  1. [Abstract] The abstract refers to 'more than 12 percentage points' without giving the precise delta or the per-class breakdown; adding these numbers would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve transparency and experimental rigor.

read point-by-point responses
  1. Referee: [Experimental results / evaluation] The central empirical claim (abstract) of a >12 pp mIoU gain on rare classes rests on the unverified assumption that layout-randomized synthetic images have visual statistics sufficiently close to real urban scenes. No quantitative checks (e.g., FID, perceptual metrics, or distribution distances) or ablations that isolate the randomization step are described, leaving open the possibility that the reported lift arises from synthesis artifacts rather than improved semantic generalization.

    Authors: The synthesis pipeline is held fixed; the sole change is the road layout in the semantic label input. Any synthesis-induced visual artifacts are therefore identical across the baseline and randomized training sets and cannot account for the observed differential gain on rare classes when both models are evaluated on the same real-world data. We will add FID scores between the generated images and real urban scenes, plus an ablation that isolates layout randomization while keeping synthesis unchanged. revision: yes

  2. Referee: [Abstract and experimental section] The abstract states the improvement occurs 'during real-world deployment in complex urban environments' but provides no details on the size or composition of the real test set, the exact rare classes involved, the baseline model and training protocol, or any statistical significance testing. These omissions make it impossible to judge whether the 12 pp figure is robust or load-bearing for the generalization claim.

    Authors: We will revise the abstract to include a concise statement of the key experimental parameters and expand the experimental section with the requested details: real test-set size and composition, the specific rare road-marking classes, the baseline model architecture and training protocol, and statistical significance from repeated runs. These additions will make the 12 pp claim fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical data-augmentation result stands on external validation

full rationale

The paper's central claim is an empirical transfer result: training a segmentation network on images synthesized from randomized semantic road layouts yields >12 pp mIoU gain on rare classes in real urban test data, with no regression on other classes. No equations, fitted parameters, or derivations are present that could reduce the reported improvement to a self-definition or input fit. The method relies on an external synthesis pipeline whose distributional fidelity is tested by downstream real-world performance rather than by construction. Self-citations, if any, are not load-bearing for the headline metric. This is a standard, falsifiable empirical setup whose validity hinges on measured generalization, not on internal re-labeling of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of randomized synthetic data to real-world performance, which is an assumption drawn from domain-randomization principles rather than independently derived or verified within the abstract.

axioms (1)
  • domain assumption Domain randomization applied to road layouts can produce synthetic images that improve real-world segmentation generalization
    The paper's method is built directly on this principle to justify synthesizing new training pairs from altered semantic labels.

pith-pipeline@v0.9.0 · 5712 in / 1183 out tokens · 28106 ms · 2026-05-25T00:05:47.926129+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 7 internal anchors

  1. [1]

    Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,

    L. Kunze, T. Bruls, T. Suleymanov, and P. Newman, “Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,” in 2018 21st International Conference on Intelligent Trans- portation Systems (ITSC) , Nov 2018, pp. 401–408

  2. [2]

    AutoAugment: Learning Augmentation Policies from Data

    E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le, “Autoaugment: Learning augmentation policies from data,” arXiv preprint arXiv:1805.09501, 2018

  3. [3]

    VeGAN: Using GANs for augmentation in latent space to improve the semantic segmentation of vehicles in images from an aerial perspective,

    R. Krajewski, T. Moers, and L. Eckstein, “VeGAN: Using GANs for augmentation in latent space to improve the semantic segmentation of vehicles in images from an aerial perspective,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) , Jan 2019, pp. 1440–1448

  4. [4]

    VPGNet: Vanishing point guided network for lane and road marking detection and recognition,

    S. Lee, J. Kim, J. S. Yoon, S. Shin, O. Bailo, N. Kim, T. Lee, H. S. Hong, S. Han, and I. S. Kweon, “VPGNet: Vanishing point guided network for lane and road marking detection and recognition,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1965–1973

  5. [5]

    Learning Semantic Segmentation from Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

    Y . Chen, W. Li, X. Chen, and L. Van Gool, “Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach,” arXiv preprint arXiv:1812.05040 , 2018

  6. [6]

    Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation

    A. Dundar, M.-Y . Liu, T.-C. Wang, J. Zedlewski, and J. Kautz, “Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation,” arXiv preprint arXiv:1807.09384 , 2018

  7. [7]

    StreetGen : In base city scale procedural generation of streets: road network, road surface and street objects

    R. Cura, J. Perret, and N. Paparoditis, “Streetgen: In base city scale procedural generation of streets: road network, road surface and street objects,” arXiv preprint arXiv:1801.05741 , 2018

  8. [8]

    High- resolution image synthesis and semantic manipulation with conditional GANs,

    T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High- resolution image synthesis and semantic manipulation with conditional GANs,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp. 8798–8807

  9. [9]

    Semantic im- age synthesis with spatially-adaptive normalization,

    T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic im- age synthesis with spatially-adaptive normalization,” arXiv preprint arXiv:1903.07291, 2019

  10. [10]

    Training deep networks with synthetic data: Bridging the reality gap by domain randomization,

    J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V . Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , June 2018, pp. 1082– 10 828

  11. [11]

    Context- aware synthesis and placement of object instances,

    D. Lee, S. Liu, J. Gu, M.-Y . Liu, M.-H. Yang, and J. Kautz, “Context- aware synthesis and placement of object instances,” in Advances in Neural Information Processing Systems , 2018, pp. 10 414–10 424

  12. [12]

    Structured domain randomiza- tion: Bridging the reality gap by context-aware synthetic data,

    A. Prakash, S. Boochoon, M. Brophy, D. Acuna, E. Cameracci, G. State, O. Shapira, and S. Birchfield, “Structured domain randomiza- tion: Bridging the reality gap by context-aware synthetic data,” arXiv preprint arXiv:1810.10093, 2018

  13. [13]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , Sep. 2017, pp. 23–30

  14. [14]

    End-to-end lane detection through differentiable least- squares fitting,

    B. De Brabandere, W. Van Gansbeke, D. Neven, M. Proesmans, and L. Van Gool, “End-to-end lane detection through differentiable least- squares fitting,” arXiv preprint arXiv:1902.00293 , 2019

  15. [15]

    3D- LaneNet: end-to-end 3D multiple lane detection,

    N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi, “3D- LaneNet: end-to-end 3D multiple lane detection,” arXiv preprint arXiv:1811.10203, 2018

  16. [16]

    EL-GAN: Embedding loss driven generative adversarial networks for lane detection,

    M. Ghafoorian, C. Nugteren, N. Baka, O. Booij, and M. Hofmann, “EL-GAN: Embedding loss driven generative adversarial networks for lane detection,” in Computer Vision – ECCV 2018 Workshops, L. Leal- Taix´e and S. Roth, Eds. Cham: Springer International Publishing, 2019, pp. 256–272

  17. [17]

    The ApolloScape dataset for autonomous driving,

    X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y . Lin, and R. Yang, “The ApolloScape dataset for autonomous driving,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2018, pp. 1067–10 676

  18. [18]

    Deep retinanet-based detection and classification of road markings by visible light camera sensors,

    T. M. Hoang, P. H. Nguyen, N. Q. Truong, Y . W. Lee, and K. R. Park, “Deep retinanet-based detection and classification of road markings by visible light camera sensors,” Sensors, vol. 19, no. 2, 2019

  19. [19]

    Mark yourself: Road marking segmentation via weakly-supervised annotations from multimodal data,

    T. Bruls, W. Maddern, A. A. Morye, and P. Newman, “Mark yourself: Road marking segmentation via weakly-supervised annotations from multimodal data,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) , May 2018, pp. 1863–1870

  20. [20]

    Virtualworlds as proxy for multi-object tracking analysis,

    A. Gaidon, Q. Wang, Y . Cabon, and E. Vig, “Virtualworlds as proxy for multi-object tracking analysis,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016, pp. 4340–4349

  21. [21]

    Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?

    M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, and R. Vasudevan, “Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?” in 2017 IEEE International Conference on Robotics and Automation (ICRA) , May 2017, pp. 746–753

  22. [22]

    Pixel level data augmentation for semantic image segmentation using generative adversarial networks,

    S. Liu, J. Zhang, Y . Chen, Y . Liu, Z. Qin, and T. Wan, “Pixel level data augmentation for semantic image segmentation using generative adversarial networks,” arXiv preprint arXiv:1811.00174 , 2018

  23. [23]

    Diverse image synthesis from semantic layouts via conditional IMLE,

    K. Li, T. Zhang, and J. Malik, “Diverse image synthesis from semantic layouts via conditional IMLE,” arXiv preprint arXiv:1811.12373 , 2018

  24. [24]

    Part-level car parsing and reconstruction from single street view,

    Q. Geng, F. Lu, X. Huang, S. Wang, X. Cheng, Z. Zhou, and R. Yang, “Part-level car parsing and reconstruction from single street view,” arXiv preprint arXiv:1811.10837 , 2018

  25. [25]

    Augmented reality meets computer vision: Efficient data generation for urban driving scenes,

    H. A. Alhaija, S. K. Mustikovela, L. Mescheder, A. Geiger, and C. Rother, “Augmented reality meets computer vision: Efficient data generation for urban driving scenes,” International Journal of Com- puter Vision, vol. 126, no. 9, pp. 961–972, 2018

  26. [26]

    Geometric Image Synthesis

    H. A. Alhaija, S. K. Mustikovela, A. Geiger, and C. Rother, “Geo- metric image synthesis,” arXiv preprint arXiv:1809.04696 , 2018

  27. [27]

    V ADRA: Visual ad- versarial domain randomization and augmentation,

    R. Khirodkar, D. Yoo, and K. M. Kitani, “V ADRA: Visual ad- versarial domain randomization and augmentation,” arXiv preprint arXiv:1812.00491, 2018

  28. [28]

    Simulating LiDAR point cloud for autonomous driving using real-world scenes and traffic flows,

    J. Fang, F. Yan, T. Zhao, F. Zhang, D. Zhou, R. Yang, Y . Ma, and L. Wang, “Simulating LiDAR point cloud for autonomous driving using real-world scenes and traffic flows,” arXiv preprint arXiv:1811.07112, 2018

  29. [29]

    AADS: Augmented au- tonomous driving simulation using data-driven algorithms,

    W. Li, C. Pan, R. Zhang, J. Ren, Y . Ma, J. Fang, F. Yan, Q. Geng, X. Huang, H. Gong et al. , “AADS: Augmented au- tonomous driving simulation using data-driven algorithms,” arXiv preprint arXiv:1901.07849, 2019

  30. [30]

    Scenic: A Language for Scenario Specification and Scene Generation

    D. J. Fremont, X. Yue, T. Dreossi, S. Ghosh, A. L. Sangiovanni- Vincentelli, and S. A. Seshia, “Scenic: Language-based scene gen- eration,” CoRR, vol. abs/1809.09310, 2018

  31. [31]

    A system for generating complex physically accurate sensor images for automotive applications

    Z. Liu, M. Shen, J. Zhang, S. Liu, H. Blasinski, T. Lian, and B. Wand ell, “A system for generating complex physically accurate sensor im- ages for automotive applications,”arXiv e-prints, p. arXiv:1902.04258, Feb 2019

  32. [32]

    1 year, 1000 km: The Oxford Robotcar dataset,

    W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The Oxford Robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017

  33. [33]

    Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,

    H. Porav, W. Maddern, and P. Newman, “Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,” in 2018 IEEE International Conference on Robotics and Automa- tion (ICRA), May 2018, pp. 1011–1018

  34. [34]

    U-Net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Con- ference on Medical Image Computing and Computer-Assisted Inter- vention. Springer, 2015, pp. 234–241

  35. [35]

    Predicting depth, surface normals and se- mantic labels with a common multi-scale convolutional architecture,

    D. Eigen and R. Fergus, “Predicting depth, surface normals and se- mantic labels with a common multi-scale convolutional architecture,” in 2015 IEEE International Conference on Computer Vision (ICCV) , Dec 2015, pp. 2650–2658