Generating All the Roads to Rome: Road Layout Randomization for Improved Road Marking Segmentation
Pith reviewed 2026-05-25 00:05 UTC · model grok-4.3
The pith
Randomizing road layouts in semantic labels generates synthetic pairs that raise rare road marking segmentation accuracy by over 12 percentage points on real urban data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying domain randomization to road layouts within semantic segmentation labels, new synthetic training pairs are generated that, when used to augment real data, improve mean intersection over union for rare road marking classes by over 12 percentage points in real-world complex urban settings, without degrading results on other classes. This method scales to produce large datasets across domains without additional manual labeling.
What carries the argument
Domain randomization applied to road layouts inside semantic labels to synthesize new training images.
If this is right
- The same procedure can be applied to any domain or driving condition to create large-scale road marking datasets.
- Performance on frequent road marking classes remains unchanged while rare classes improve.
- Manual labeling effort for road markings can be reduced or eliminated for many training scenarios.
- The generated pairs integrate directly into existing segmentation training pipelines for urban environments.
Where Pith is reading between the lines
- The randomization principle could transfer to other layout-sensitive segmentation tasks such as lane or curb detection.
- Wider use might reduce the data gap between simulation and real deployment for perception modules in autonomous systems.
- Repeated application across many base labels could systematically cover layout variations that are rare in collected data.
Load-bearing premise
Random alterations to road layouts must produce synthetic images close enough to real-world variations to deliver measurable gains without introducing artifacts that reduce overall performance.
What would settle it
Training a segmentation model on the generated pairs and then measuring mIoU on a held-out real-world urban test set; no gain or a drop for rare classes relative to a baseline trained only on the original data would falsify the central claim.
Figures
read the original abstract
Road markings provide guidance to traffic participants and enforce safe driving behaviour, understanding their semantic meaning is therefore paramount in (automated) driving. However, producing the vast quantities of road marking labels required for training state-of-the-art deep networks is costly, time-consuming, and simply infeasible for every domain and condition. In addition, training data retrieved from virtual worlds often lack the richness and complexity of the real world and consequently cannot be used directly. In this paper, we provide an alternative approach in which new road marking training pairs are automatically generated. To this end, we apply principles of domain randomization to the road layout and synthesize new images from altered semantic labels. We demonstrate that training on these synthetic pairs improves mIoU of the segmentation of rare road marking classes during real-world deployment in complex urban environments by more than 12 percentage points, while performance for other classes is retained. This framework can easily be scaled to all domains and conditions to generate large-scale road marking datasets, while avoiding manual labelling effort.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes using domain randomization on road layouts within semantic labels, followed by image synthesis, to automatically generate new synthetic training pairs for road marking segmentation. It reports that models trained on these pairs achieve more than 12 percentage points higher mIoU on rare road marking classes when deployed on real-world complex urban data, while performance on other classes remains unchanged. The approach is positioned as a scalable method to create large datasets without manual labeling effort.
Significance. If the empirical transfer result holds under closer scrutiny, the work would offer a practical route to address data scarcity for rare classes in semantic segmentation for autonomous driving, reducing annotation costs while preserving overall model utility. The emphasis on retaining performance on common classes strengthens the practical value.
major comments (2)
- [Experimental results / evaluation] The central empirical claim (abstract) of a >12 pp mIoU gain on rare classes rests on the unverified assumption that layout-randomized synthetic images have visual statistics sufficiently close to real urban scenes. No quantitative checks (e.g., FID, perceptual metrics, or distribution distances) or ablations that isolate the randomization step are described, leaving open the possibility that the reported lift arises from synthesis artifacts rather than improved semantic generalization.
- [Abstract and experimental section] The abstract states the improvement occurs 'during real-world deployment in complex urban environments' but provides no details on the size or composition of the real test set, the exact rare classes involved, the baseline model and training protocol, or any statistical significance testing. These omissions make it impossible to judge whether the 12 pp figure is robust or load-bearing for the generalization claim.
minor comments (1)
- [Abstract] The abstract refers to 'more than 12 percentage points' without giving the precise delta or the per-class breakdown; adding these numbers would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve transparency and experimental rigor.
read point-by-point responses
-
Referee: [Experimental results / evaluation] The central empirical claim (abstract) of a >12 pp mIoU gain on rare classes rests on the unverified assumption that layout-randomized synthetic images have visual statistics sufficiently close to real urban scenes. No quantitative checks (e.g., FID, perceptual metrics, or distribution distances) or ablations that isolate the randomization step are described, leaving open the possibility that the reported lift arises from synthesis artifacts rather than improved semantic generalization.
Authors: The synthesis pipeline is held fixed; the sole change is the road layout in the semantic label input. Any synthesis-induced visual artifacts are therefore identical across the baseline and randomized training sets and cannot account for the observed differential gain on rare classes when both models are evaluated on the same real-world data. We will add FID scores between the generated images and real urban scenes, plus an ablation that isolates layout randomization while keeping synthesis unchanged. revision: yes
-
Referee: [Abstract and experimental section] The abstract states the improvement occurs 'during real-world deployment in complex urban environments' but provides no details on the size or composition of the real test set, the exact rare classes involved, the baseline model and training protocol, or any statistical significance testing. These omissions make it impossible to judge whether the 12 pp figure is robust or load-bearing for the generalization claim.
Authors: We will revise the abstract to include a concise statement of the key experimental parameters and expand the experimental section with the requested details: real test-set size and composition, the specific rare road-marking classes, the baseline model architecture and training protocol, and statistical significance from repeated runs. These additions will make the 12 pp claim fully verifiable. revision: yes
Circularity Check
No circularity: empirical data-augmentation result stands on external validation
full rationale
The paper's central claim is an empirical transfer result: training a segmentation network on images synthesized from randomized semantic road layouts yields >12 pp mIoU gain on rare classes in real urban test data, with no regression on other classes. No equations, fitted parameters, or derivations are present that could reduce the reported improvement to a self-definition or input fit. The method relies on an external synthesis pipeline whose distributional fidelity is tested by downstream real-world performance rather than by construction. Self-citations, if any, are not load-bearing for the headline metric. This is a standard, falsifiable empirical setup whose validity hinges on measured generalization, not on internal re-labeling of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Domain randomization applied to road layouts can produce synthetic images that improve real-world segmentation generalization
Reference graph
Works this paper leans on
-
[1]
Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,
L. Kunze, T. Bruls, T. Suleymanov, and P. Newman, “Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,” in 2018 21st International Conference on Intelligent Trans- portation Systems (ITSC) , Nov 2018, pp. 401–408
work page 2018
-
[2]
AutoAugment: Learning Augmentation Policies from Data
E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le, “Autoaugment: Learning augmentation policies from data,” arXiv preprint arXiv:1805.09501, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
R. Krajewski, T. Moers, and L. Eckstein, “VeGAN: Using GANs for augmentation in latent space to improve the semantic segmentation of vehicles in images from an aerial perspective,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) , Jan 2019, pp. 1440–1448
work page 2019
-
[4]
VPGNet: Vanishing point guided network for lane and road marking detection and recognition,
S. Lee, J. Kim, J. S. Yoon, S. Shin, O. Bailo, N. Kim, T. Lee, H. S. Hong, S. Han, and I. S. Kweon, “VPGNet: Vanishing point guided network for lane and road marking detection and recognition,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1965–1973
work page 2017
-
[5]
Y . Chen, W. Li, X. Chen, and L. Van Gool, “Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach,” arXiv preprint arXiv:1812.05040 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation
A. Dundar, M.-Y . Liu, T.-C. Wang, J. Zedlewski, and J. Kautz, “Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation,” arXiv preprint arXiv:1807.09384 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
R. Cura, J. Perret, and N. Paparoditis, “Streetgen: In base city scale procedural generation of streets: road network, road surface and street objects,” arXiv preprint arXiv:1801.05741 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
High- resolution image synthesis and semantic manipulation with conditional GANs,
T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High- resolution image synthesis and semantic manipulation with conditional GANs,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp. 8798–8807
work page 2018
-
[9]
Semantic im- age synthesis with spatially-adaptive normalization,
T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic im- age synthesis with spatially-adaptive normalization,” arXiv preprint arXiv:1903.07291, 2019
-
[10]
Training deep networks with synthetic data: Bridging the reality gap by domain randomization,
J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V . Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , June 2018, pp. 1082– 10 828
work page 2018
-
[11]
Context- aware synthesis and placement of object instances,
D. Lee, S. Liu, J. Gu, M.-Y . Liu, M.-H. Yang, and J. Kautz, “Context- aware synthesis and placement of object instances,” in Advances in Neural Information Processing Systems , 2018, pp. 10 414–10 424
work page 2018
-
[12]
Structured domain randomiza- tion: Bridging the reality gap by context-aware synthetic data,
A. Prakash, S. Boochoon, M. Brophy, D. Acuna, E. Cameracci, G. State, O. Shapira, and S. Birchfield, “Structured domain randomiza- tion: Bridging the reality gap by context-aware synthetic data,” arXiv preprint arXiv:1810.10093, 2018
-
[13]
Domain randomization for transferring deep neural networks from simulation to the real world,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) , Sep. 2017, pp. 23–30
work page 2017
-
[14]
End-to-end lane detection through differentiable least- squares fitting,
B. De Brabandere, W. Van Gansbeke, D. Neven, M. Proesmans, and L. Van Gool, “End-to-end lane detection through differentiable least- squares fitting,” arXiv preprint arXiv:1902.00293 , 2019
-
[15]
3D- LaneNet: end-to-end 3D multiple lane detection,
N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi, “3D- LaneNet: end-to-end 3D multiple lane detection,” arXiv preprint arXiv:1811.10203, 2018
-
[16]
EL-GAN: Embedding loss driven generative adversarial networks for lane detection,
M. Ghafoorian, C. Nugteren, N. Baka, O. Booij, and M. Hofmann, “EL-GAN: Embedding loss driven generative adversarial networks for lane detection,” in Computer Vision – ECCV 2018 Workshops, L. Leal- Taix´e and S. Roth, Eds. Cham: Springer International Publishing, 2019, pp. 256–272
work page 2018
-
[17]
The ApolloScape dataset for autonomous driving,
X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y . Lin, and R. Yang, “The ApolloScape dataset for autonomous driving,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2018, pp. 1067–10 676
work page 2018
-
[18]
Deep retinanet-based detection and classification of road markings by visible light camera sensors,
T. M. Hoang, P. H. Nguyen, N. Q. Truong, Y . W. Lee, and K. R. Park, “Deep retinanet-based detection and classification of road markings by visible light camera sensors,” Sensors, vol. 19, no. 2, 2019
work page 2019
-
[19]
Mark yourself: Road marking segmentation via weakly-supervised annotations from multimodal data,
T. Bruls, W. Maddern, A. A. Morye, and P. Newman, “Mark yourself: Road marking segmentation via weakly-supervised annotations from multimodal data,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) , May 2018, pp. 1863–1870
work page 2018
-
[20]
Virtualworlds as proxy for multi-object tracking analysis,
A. Gaidon, Q. Wang, Y . Cabon, and E. Vig, “Virtualworlds as proxy for multi-object tracking analysis,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016, pp. 4340–4349
work page 2016
-
[21]
Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?
M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, and R. Vasudevan, “Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?” in 2017 IEEE International Conference on Robotics and Automation (ICRA) , May 2017, pp. 746–753
work page 2017
-
[22]
Pixel level data augmentation for semantic image segmentation using generative adversarial networks,
S. Liu, J. Zhang, Y . Chen, Y . Liu, Z. Qin, and T. Wan, “Pixel level data augmentation for semantic image segmentation using generative adversarial networks,” arXiv preprint arXiv:1811.00174 , 2018
-
[23]
Diverse image synthesis from semantic layouts via conditional IMLE,
K. Li, T. Zhang, and J. Malik, “Diverse image synthesis from semantic layouts via conditional IMLE,” arXiv preprint arXiv:1811.12373 , 2018
-
[24]
Part-level car parsing and reconstruction from single street view,
Q. Geng, F. Lu, X. Huang, S. Wang, X. Cheng, Z. Zhou, and R. Yang, “Part-level car parsing and reconstruction from single street view,” arXiv preprint arXiv:1811.10837 , 2018
-
[25]
Augmented reality meets computer vision: Efficient data generation for urban driving scenes,
H. A. Alhaija, S. K. Mustikovela, L. Mescheder, A. Geiger, and C. Rother, “Augmented reality meets computer vision: Efficient data generation for urban driving scenes,” International Journal of Com- puter Vision, vol. 126, no. 9, pp. 961–972, 2018
work page 2018
-
[26]
H. A. Alhaija, S. K. Mustikovela, A. Geiger, and C. Rother, “Geo- metric image synthesis,” arXiv preprint arXiv:1809.04696 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[27]
V ADRA: Visual ad- versarial domain randomization and augmentation,
R. Khirodkar, D. Yoo, and K. M. Kitani, “V ADRA: Visual ad- versarial domain randomization and augmentation,” arXiv preprint arXiv:1812.00491, 2018
-
[28]
Simulating LiDAR point cloud for autonomous driving using real-world scenes and traffic flows,
J. Fang, F. Yan, T. Zhao, F. Zhang, D. Zhou, R. Yang, Y . Ma, and L. Wang, “Simulating LiDAR point cloud for autonomous driving using real-world scenes and traffic flows,” arXiv preprint arXiv:1811.07112, 2018
-
[29]
AADS: Augmented au- tonomous driving simulation using data-driven algorithms,
W. Li, C. Pan, R. Zhang, J. Ren, Y . Ma, J. Fang, F. Yan, Q. Geng, X. Huang, H. Gong et al. , “AADS: Augmented au- tonomous driving simulation using data-driven algorithms,” arXiv preprint arXiv:1901.07849, 2019
-
[30]
Scenic: A Language for Scenario Specification and Scene Generation
D. J. Fremont, X. Yue, T. Dreossi, S. Ghosh, A. L. Sangiovanni- Vincentelli, and S. A. Seshia, “Scenic: Language-based scene gen- eration,” CoRR, vol. abs/1809.09310, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[31]
A system for generating complex physically accurate sensor images for automotive applications
Z. Liu, M. Shen, J. Zhang, S. Liu, H. Blasinski, T. Lian, and B. Wand ell, “A system for generating complex physically accurate sensor im- ages for automotive applications,”arXiv e-prints, p. arXiv:1902.04258, Feb 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[32]
1 year, 1000 km: The Oxford Robotcar dataset,
W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The Oxford Robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017
work page 2017
-
[33]
Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,
H. Porav, W. Maddern, and P. Newman, “Adversarial training for adverse conditions: Robust metric localisation using appearance trans- fer,” in 2018 IEEE International Conference on Robotics and Automa- tion (ICRA), May 2018, pp. 1011–1018
work page 2018
-
[34]
U-Net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Con- ference on Medical Image Computing and Computer-Assisted Inter- vention. Springer, 2015, pp. 234–241
work page 2015
-
[35]
D. Eigen and R. Fergus, “Predicting depth, surface normals and se- mantic labels with a common multi-scale convolutional architecture,” in 2015 IEEE International Conference on Computer Vision (ICCV) , Dec 2015, pp. 2650–2658
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.