pith. sign in

arxiv: 2602.21141 · v2 · pith:A3S6OKI3new · submitted 2026-02-24 · 💻 cs.CV

SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object Perception

Pith reviewed 2026-05-21 11:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords synthetic data generationsim-to-real transferdomain randomizationindustrial object detectionrobotic perceptioncomputer vision datasetbidirectional transfer
0
0 comments X

The pith

Synthetic data generation with guided domain randomization trains industrial object detectors to over 95 percent accuracy on real imagery without real-world fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SynthRender, an open-source framework that creates synthetic training images by first reconstructing 3D assets from physical industrial parts and then applying Guided Domain Randomization to vary lighting, textures, and camera parameters. It pairs this generator with the new IRIS dataset of 32 object classes captured under realistic factory conditions. The central demonstration is that detectors trained exclusively on the resulting synthetic images reach 99.1 percent mAP@50 on a public robotics benchmark, 98.3 percent on an automotive set, and 95.3 percent on IRIS itself. A sympathetic reader cares because this removes the need to collect and annotate thousands of real images for each proprietary part, lowering the cost barrier for deploying perception in semi-uncontrolled industrial settings.

Core claim

The authors claim that an integrated pipeline of 2D-to-3D reality-to-simulation asset creation plus programmatic Guided Domain Randomization inside SynthRender produces synthetic images whose statistics are close enough to real industrial camera output that standard detectors trained only on those images generalize to real test sets at the reported high mAP levels, and that the accompanying IRIS dataset supplies the necessary controlled conditions for measuring bidirectional sim-real transfer.

What carries the argument

Guided Domain Randomization inside the SynthRender framework, which systematically varies a small set of rendering parameters chosen to align synthetic image statistics with target real-camera conditions.

If this is right

  • Ablation results yield concrete guidelines on which rendering choices most improve sim-to-real transfer for textured industrial parts.
  • The framework supports both sim-to-real training and real-to-sim evaluation because it supplies matching CAD models and reconstructed meshes.
  • High scores across three distinct benchmarks indicate the approach scales to different object categories and imaging conditions common in factories.
  • Open release of both the generator and the 19,672-annotation IRIS set enables direct replication and extension by other groups working on proprietary parts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same parameter-tuning discipline could be applied to other perception tasks such as segmentation or pose estimation in similar constrained environments.
  • Automated search over the Guided Domain Randomization parameter space might further reduce the manual effort needed to adapt the method to new camera setups.
  • Because the method relies on 3D asset reconstruction from real parts, it naturally supports incremental addition of new proprietary objects without redesigning the entire pipeline.

Load-bearing premise

The specific randomization parameters selected in simulation will make the distribution of synthetic images close enough to real industrial photographs that no additional real data or adaptation step is required for good performance.

What would settle it

A side-by-side comparison in which the mAP@50 of a model trained on SynthRender images drops below 80 percent when evaluated on a new real test set whose lighting, background clutter, or sensor noise visibly differs from the ranges used during Guided Domain Randomization.

Figures

Figures reproduced from arXiv: 2602.21141 by Adri\'an Sanchis Reig, Jens Lambrecht, J\"org Kr\"uger, Jose Moises Araya-Martinez, Pablo Rey Valiente, Thushar Tom.

Figure 1
Figure 1. Figure 1: Architecture for efficient domain adaptation, domain [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 3D asset and texture generation as DA approaches. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Functional diagram of SynthRender, illustrating the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: IRIS objects. Synthetic renders of the 32 industrial [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: SynthRender on three SOTA object detection models. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-class performance comparison of IRIS objects [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ablation on IRIS training set size (200–3200 images) [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Object perception is fundamental for tasks such as robotic material handling and quality inspection. However, modern supervised deep-learning models require large annotated datasets for robust automation under semi-uncontrolled conditions; a major barrier for widespread deployment with proprietary industrial parts. We address this through an integrated framework combining synthetic data generation and structured empirical evaluation for systematic investigation of bidirectional sim-to-real transfer. Our method integrates 2D-to-3D Reality-to-Simulation techniques for 3D asset creation from physical parts with programmatic Guided Domain Randomization (GDR) via SynthRender, an open-source synthetic image generation framework. Structured ablation studies across multiple benchmarks quantify the impact of individual rendering design choices, yielding practical guidelines for dataefficient synthetic training. To support evaluation under realistic industrial conditions, we introduce Industrial Real-Sim Imagery Set (IRIS), a 32-class dataset with diverse textures, intra-class variation, strong inter-class similarities, and 19,672 annotations, providing both CAD models and reconstructed meshes for bidirectional sim-to-real benchmarking. Across three industrial benchmarks, the proposed framework achieves highly competitive performance, reaching 99.1% mAP@50 on a public robotics dataset, 98.3% mAP@50 on an automotive benchmark, and 95.3% mAP@50 on IRIS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SynthRender, an open-source framework for synthetic image generation that combines 2D-to-3D Reality-to-Simulation asset creation with programmatic Guided Domain Randomization (GDR). It also releases the IRIS dataset (32 classes, 19,672 annotations, with CAD models and reconstructed meshes). Structured ablation studies across three industrial benchmarks are used to derive practical guidelines, and the framework is reported to achieve 99.1% mAP@50 on a public robotics dataset, 98.3% mAP@50 on an automotive benchmark, and 95.3% mAP@50 on IRIS, all without real-data fine-tuning or domain adaptation.

Significance. If the central claims hold, the work provides a practical, open-source route to data-efficient training for industrial object perception where real annotated data is scarce or proprietary. The bidirectional sim-real design, the new IRIS benchmark, and the ablation-derived guidelines are concrete contributions that could be adopted by robotics and inspection practitioners. The release of both the rendering framework and the dataset with meshes strengthens reproducibility and enables future comparisons.

major comments (2)
  1. [§3.2] §3.2 (Guided Domain Randomization): The high mAP claims rest on the assumption that GDR produces image statistics sufficiently close to the three real test distributions. The manuscript supplies only high-level descriptions of GDR; it does not report the exact parameter ranges, sampling distributions, or any quantitative domain-gap metrics (FID, MMD, or per-channel histogram distances) between the generated SynthRender images and the corresponding real robotics/automotive/IRIS images. This omission is load-bearing because the reported performance may reflect fortunate alignment on the chosen objects and capture conditions rather than a general, validated transfer method.
  2. [§4] §4 (Experiments): The structured ablation studies are presented as evidence that individual rendering choices matter, yet the text does not include error bars, exact train/test splits, or the number of random seeds used. Without these, it is difficult to determine whether the reported gains (e.g., from adding GDR) are statistically reliable or sensitive to post-hoc choices of hyperparameters.
minor comments (2)
  1. [Abstract and §4.1] The abstract and §4.1 do not state the object detector backbone or training protocol (e.g., YOLOv8, Faster R-CNN, input resolution, optimizer). Adding these details would improve reproducibility.
  2. [Table 1] Table 1 (dataset statistics) lists 19,672 annotations but does not clarify the number of distinct images or the precise train/validation/test partition used for the IRIS benchmark; this information is needed to interpret the 95.3% mAP figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate the revisions we will make to improve clarity, reproducibility, and statistical rigor.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Guided Domain Randomization): The high mAP claims rest on the assumption that GDR produces image statistics sufficiently close to the three real test distributions. The manuscript supplies only high-level descriptions of GDR; it does not report the exact parameter ranges, sampling distributions, or any quantitative domain-gap metrics (FID, MMD, or per-channel histogram distances) between the generated SynthRender images and the corresponding real robotics/automotive/IRIS images. This omission is load-bearing because the reported performance may reflect fortunate alignment on the chosen objects and capture conditions rather than a general, validated transfer method.

    Authors: We agree that the current description of Guided Domain Randomization is insufficiently detailed for full reproducibility and that quantitative domain-gap metrics would strengthen the validation of the transfer method. In the revised manuscript we will expand §3.2 to list the exact parameter ranges and sampling distributions employed for each randomization factor. We will also compute and report Fréchet Inception Distance (FID) scores between the SynthRender-generated images and the real images of each benchmark (robotics, automotive, and IRIS) to provide a direct quantitative measure of distributional similarity. revision: yes

  2. Referee: [§4] §4 (Experiments): The structured ablation studies are presented as evidence that individual rendering choices matter, yet the text does not include error bars, exact train/test splits, or the number of random seeds used. Without these, it is difficult to determine whether the reported gains (e.g., from adding GDR) are statistically reliable or sensitive to post-hoc choices of hyperparameters.

    Authors: We acknowledge that the absence of error bars and explicit experimental protocol details limits the ability to assess statistical reliability. The ablation experiments were run with five independent random seeds, and train/test splits followed the official protocols of each public benchmark. In the revised manuscript we will add error bars (mean ± standard deviation across seeds) to all ablation tables and figures in §4 and will explicitly document the number of seeds together with the precise train/test split ratios used for every experiment. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical benchmarks are self-contained

full rationale

The manuscript presents an open-source synthetic rendering framework (SynthRender) with Guided Domain Randomization and a new 32-class dataset (IRIS) for sim-to-real transfer evaluation. All reported results are direct empirical mAP@50 measurements on three separate industrial benchmarks (99.1 % on public robotics data, 98.3 % on automotive, 95.3 % on IRIS) obtained from ablation studies. These outcomes are measured against held-out real test images and do not reduce to any fitted parameter, self-definition, or self-citation chain. No equations, uniqueness theorems, or ansatz adoptions appear in the provided text that would create a circular derivation; the performance numbers are falsifiable experimental observations rather than constructed predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard computer-vision assumptions about the utility of synthetic data and domain randomization; no new physical entities or ad-hoc constants are introduced beyond typical rendering parameters.

axioms (1)
  • domain assumption Synthetic images generated via 3D reconstruction and domain randomization can produce training distributions that support high-accuracy models on real industrial imagery.
    Invoked implicitly when claiming that models trained only on SynthRender data reach 95-99% mAP on real benchmarks.

pith-pipeline@v0.9.0 · 5794 in / 1344 out tokens · 32262 ms · 2026-05-21T11:48:44.549198+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

  1. [1]

    Leveraging synthetic training data for object detection to enhance autonomous depalletizing systems,

    F. T ¨oper, J. M. Araya-Martinez, A. S. Reig, T. Tom, S. Sardari, and P. Ohlhausen, “Leveraging synthetic training data for object detection to enhance autonomous depalletizing systems,” inEuropean Robotics F orum. Springer, 2025, pp. 229–235

  2. [2]

    Domain adaptation using vision transformers and xai for fully synthetic industrial train- ing,

    J. M. Araya-Martinez, T. Tom, S. Sardari, A. Sanchis Reig, G. Mohan, A. Shukla, F. T ¨oper, J. Lambrecht, and J. Kr ¨uger, “Domain adaptation using vision transformers and xai for fully synthetic industrial train- ing,”Procedia CIRP, vol. 135, 2025, 35th CIRP Design Conference

  3. [3]

    Foundationpose: Unified 6d pose estimation and tracking of novel objects,

    B. Wen, W. Yang, J. Kautz, and S. Birchfield, “Foundationpose: Unified 6d pose estimation and tracking of novel objects,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 17 868–17 879

  4. [4]

    Segment anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Doll ´ar, and R. Girshick, “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4015–4026

  5. [5]

    Yolo-v1 to yolo-v8, the rise of yolo and its comple- mentary nature toward digital manufacturing and industrial defect detection,

    M. Hussain, “Yolo-v1 to yolo-v8, the rise of yolo and its comple- mentary nature toward digital manufacturing and industrial defect detection,”Machines and Tooling, vol. 11, p. 677, 2023

  6. [6]

    Deim: Detr with improved matching for fast convergence,

    S. Huang, Z. Lu, X. Cun, Y . Yu, X. Zhou, and X. Shen, “Deim: Detr with improved matching for fast convergence,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 15 162–15 171

  7. [7]

    Meshsplats: Mesh-based rendering with gaussian splatting initialization, 2026

    R. Tobiasz, G. Wilczy ´nski, M. Mazur, S. Tadeja, and P. Spurek, “Mesh- splats: Mesh-based rendering with gaussian splatting initialization,” arXiv preprint arXiv:2502.07754, 2025

  8. [8]

    Blenderproc2: A procedural pipeline for photorealistic rendering,

    M. Denninger, D. Winkelbauer, M. Sundermeyer, W. Boerdijk, M. Knauer, K. H. Strobl, M. Humt, and R. Triebel, “Blenderproc2: A procedural pipeline for photorealistic rendering,”Journal of Open Source Software, vol. 8, no. 82, p. 4901, 2023. [Online]. Available: https://doi.org/10.21105/joss.04901

  9. [9]

    Bullet physics library,

    E. Coumans, “Bullet physics library,” https://github.com/bulletphysics/ bullet3, accessed: 2025-02-01

  10. [10]

    Blender 4.0,

    T. B. Foundation, “Blender 4.0,” 2023, https://projects.blender.org/ blender/blender.git [Accessed: (12.06.2025)]

  11. [11]

    Structured 3D Latents for Scalable and Versatile 3D Generation

    J. Xiang, Z. Lv, S. Xu, Y . Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,”arXiv preprint arXiv:2412.01506, 2024

  12. [12]

    Meshy ai - the #1 ai 3d model generator,

    MeshyAI Team, “Meshy ai - the #1 ai 3d model generator,” https: //www.meshy.ai/discover, 2025, accessed: 2025-11-18

  13. [13]

    Object detection using sim2real domain randomization for robotic applica- tions,

    D. Horv ´ath, G. Erd ˝os, Z. Istenes, T. Horv ´ath, and S. F ¨oldi, “Object detection using sim2real domain randomization for robotic applica- tions,”IEEE Transactions on Robotics, vol. 39, no. 2, pp. 1225–1243, 2022

  14. [14]

    Domain randomization for object de- tection in manufacturing applications using synthetic data: A compre- hensive study,

    X. Zhu, J. Henningsson, D. Li, P. M ˚artensson, L. Hanson, M. Bj ¨orkman, and A. Maki, “Domain randomization for object de- tection in manufacturing applications using synthetic data: A compre- hensive study,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

  15. [15]

    A data-centric evaluation of leading multi-class object detection algorithms using synthetic industrial data,

    J. M. Araya-Martinez, S. Sardari, M. Lambert, J. A. Zak, F. T ¨oper, J. Kr ¨uger, and J. Lambrecht, “A data-centric evaluation of leading multi-class object detection algorithms using synthetic industrial data,” inAdvances in Automotive Production Technology – Digital Product Development and Manufacturing, D. Holder, F. Wulle, and J. Lind, Eds. Cham: Spri...

  16. [16]

    The pascal visual object classes (voc) challenge,

    M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”International journal of computer vision, vol. 88, pp. 303–338, 2010

  17. [17]

    Microsoft coco: Common objects in context,

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755

  18. [18]

    Generating images with physics- based rendering for an industrial object detection task: Realism versus domain randomization,

    L. Eversverg and J. Lambrecht, “Generating images with physics- based rendering for an industrial object detection task: Realism versus domain randomization,”Sensors, vol. 21, no. 23, p. 7901, 2021

  19. [19]

    Synthetic industrial object detection: Genai vs. feature-based methods,

    J. M. Araya-Martinez, A. Sanchis Reig, G. Mohan, S. Sardari, J. Lam- brecht, and J. Kr ¨uger, “Synthetic industrial object detection: Genai vs. feature-based methods,”Procedia CIRP, 2025, 19th CIRP Conference on Intelligent Computation in Manufacturing Engineering, in press

  20. [20]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ international con- ference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30

  21. [21]

    Towards fully-synthetic train- ing for industrial applications,

    C. Mayershofer, T. Ge, and J. Fottner, “Towards fully-synthetic train- ing for industrial applications,” inLISS 2020. Springer Singapore, 2021, pp. 765–782

  22. [22]

    Towards sim-to-real industrial parts classification with synthetic dataset,

    X. Zhu, T. Bilal, P. M ˚artensson, L. Hanson, M. Bj ¨orkman, and A. Maki, “Towards sim-to-real industrial parts classification with synthetic dataset,” in2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023, pp. 4454–4463

  23. [23]

    Pharr, W

    M. Pharr, W. Jakob, and G. Humphreys,Physically Based Rendering: From Theory to Implementation, 3rd ed. San Francisco, CA: Morgan Kaufmann, 2016

  24. [24]

    Kiri engine: 3d scanner app for iphone, android, and web,

    Kiri Engine Team, “Kiri engine: 3d scanner app for iphone, android, and web,” https://www.kiriengine.app/, 2025, accessed: 2025-10-25

  25. [25]

    Recovering high dynamic range radiance maps from photographs,

    P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” inProceedings of SIGGRAPH 1997, 1997, pp. 369–378

  26. [26]

    The cycles render engine,

    B. Foundation, “The cycles render engine,” 2023, https://projects. blender.org/blender/cycles.git [Accessed: (12.06.2025)]