arxiv: 2604.26633 · v1 · submitted 2026-04-29 · 💻 cs.CV · cs.AI

Recognition: unknown

SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

Paul Julius K\"uhn , Mika Pommeranz , Arjan Kuijper , Saptarshi Neil Sinha

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:33 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords synthetic data generationindustrial defect detectiondiffusion modelsLoRA adaptationdata augmentationsurface defect segmentationvision language modelsball screw inspection

0 comments

The pith

An end-to-end pipeline produces synthetic industrial surface defects that, when added to real data, preserve or modestly improve detector performance instead of replacing real samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses data scarcity in learning-based industrial defect detection by building a complete generative pipeline that starts from vision-language model prompts, applies LoRA-adapted diffusion with mask-guided inpainting, and filters outputs using DreamSim and CLIPScore to create automatically labeled synthetic defects. Evaluation on a ball screw drive pitting dataset shows that training detectors solely on these synthetics underperforms real data, yet mixing the two maintains accuracy and produces small gains in certain training regimes for models such as YOLOv8, YOLOX, and LW-DETR. The same pipeline structure transfers to a mobile phone screen defect segmentation task after domain-specific adaptation and quality controls, confirming that its value lies in strengthening limited real datasets rather than substituting for them. A sympathetic reader would care because real defect collection remains slow and costly, so reliable augmentation methods could shorten development cycles for inspection systems.

Core claim

The central discovery is that the described pipeline generates realistic synthetic defects whose combination with real samples preserves downstream detector performance on the BSData pitting task and carries over to the MSD dataset, while purely synthetic training falls short and the pipeline requires careful prompt design, LoRA selection, and filtering to avoid unhelpful artifacts.

What carries the argument

The SynSur end-to-end pipeline, which chains VLM prompt construction, LoRA-adapted diffusion inpainting guided by defect masks, automatic label derivation, and DreamSim/CLIPScore filtering to produce usable synthetic training samples.

If this is right

Synthetic-only training produces lower detector performance than real data alone on the evaluated industrial tasks.
Adding filtered synthetic defects to real data maintains or slightly raises performance in selected BSData regimes for the tested detector architectures.
The overall pipeline transfers to a second domain such as mobile phone screen defects, but requires domain-specific LoRA adaptation and annotation-quality checks.
Analysis of individual stages shows that prompt construction, LoRA choice, and sample filtering determine which synthetics prove useful downstream.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be most helpful for defect classes that appear very rarely in real collections, where even modest synthetic additions might stabilize training.
The filtering metrics might generalize to other generative models if the same realism and usefulness criteria are applied.
Extending the pipeline to generate defects on entirely new surface types would test how much domain adaptation is truly required each time.

Load-bearing premise

The generated synthetic samples are realistic and distributionally close enough to real defects that mixing them into training sets improves or at least does not degrade detector performance.

What would settle it

Retraining the same detectors on real BSData splits plus the pipeline's synthetic samples yields consistently lower mAP or F1 scores than real data alone across multiple random splits and hyperparameter settings.

Figures

Figures reproduced from arXiv: 2604.26633 by Arjan Kuijper, Mika Pommeranz, Paul Julius K\"uhn, Saptarshi Neil Sinha.

**Figure 1.** Figure 1: Overview of the proposed end-to-end pipeline for synthetic defect data generation. From left to right: data view at source ↗

**Figure 2.** Figure 2: Synthetic mask (a) corresponding defect-free crop (b), and defect patch (c) generated from inputs (a) and (b). Label derivation. Since inpainting may introduce background artifacts, SAM 3 [7] refines the generated defect masks 3 view at source ↗

**Figure 3.** Figure 3: Examples of pitting defects on a ball screw drive spindle view at source ↗

**Figure 4.** Figure 4: Heatmaps of defect locations for the two retained image resolutions ( view at source ↗

**Figure 5.** Figure 5: Data samples with multiple scratches (left) and a single scratch (right). MSD. [52] contains 1,200 images with three defect types: oil, scratch, and stain (400 images each). For cross-dataset evaluation, we use only the scratch subset to assess pipeline transferability to a different defect type and domain ( view at source ↗

**Figure 6.** Figure 6: Representative outputs of the four LoRA [ view at source ↗

**Figure 7.** Figure 7: A synthetic defect sample generated by the top-performing LoRA [ view at source ↗

**Figure 8.** Figure 8: BSData [38] prompt derived from frequent Qwen [48] tags and light manual pruning. The prompt emphasizes material, morphology, texture, and recording conditions relevant to pitting defects. Data Generation. For BSData [38], we generate 1,000 candidate defect patches via mask-guided inpainting with Flux.1-dev [5, 21] and the selected LoRA [18]. We then filter this pool using DreamSim [14] and CLIPScore [16] … view at source ↗

**Figure 9.** Figure 9: Final MSD [52] prompt derived from frequent Qwen2-VL [48] tags and light manual pruning. The prompt emphasizes scratch geometry, reflective display appearance, and controlled acquisition conditions. Synthetic data generation. For MSD [52], we train a single LoRA [18] to keep the cross-dataset study compact. The model is trained for 2,000 steps on 100 randomly sampled scratch patches. This produces 1,000 ca… view at source ↗

**Figure 10.** Figure 10: Limitations of Flux.1-dev [5, 21] without finetuning. (a–b) Unconditional generations: prompt optimization alone fails to produce domain-consistent defect appearances for BSData. (c–f) Inpainting without LoRA [18] adaptation: given the same image patch and mask (c–d), Flux.1-dev yields domain-inconsistent defect structures regardless of the prompt used (e–f). selections; some remain visually subtle, which… view at source ↗

**Figure 11.** Figure 11: Ranking extremes for synthetic patches on BSData [ view at source ↗

**Figure 12.** Figure 12: Representative synthetic samples. Top row: BSData [38]; bottom row: MSD [52]. Left (a,b,e,f) show successful generations exhibiting plausible defect placement and realistic morphology. Right (c,d,g,h) show typical failure cases, including boundary overlap, geometric distortion, and mask spillover artifacts. 5 Conclusion We presented an end-to-end pipeline for generating annotated synthetic defect images, … view at source ↗

read the original abstract

The bottleneck in learning-based industrial defect detection is often limited not by model capacity, but by the scarcity of labeled defect data: defects are rare, annotations are expensive, and collecting balanced training sets is slow. We present an end-to-end pipeline for synthetic defect generation and annotation, combining Vision-Language-Model-based prompts, LoRA-adapted diffusion, mask-guided inpainting, and sample filtering with automatic label derivation, and demonstrates the potential of real data with realistic synthetic samples to overcome data scarcity. The evaluation is conducted on, a challenging dataset of pitting defects on ball screw drives, and then on a subset of the Mobile phone screen surface defect segmentation dataset (MSD) dataset to test cross-domain transfer. Beyond downstream detector performance, we analyze key stages of the pipeline, including prompt construction, LoRA selection, and sample filtering with DreamSim and CLIPScore, to understand which synthetic samples are both realistic and useful. Experiments with YOLOv26, YOLOX, and LW-DETR show that synthetic-only training does not replace real data. When combined with real data, synthetic defects can preserve performance and yield modest gains in selected BSData training regimes. The MSD transfer study shows that the overall pipeline structure carries over to a second industrial inspection domain, while also highlighting the importance of domain-specific adaptation and annotation-quality control. Overall, the paper provides an end-to-end assessment of diffusion-based industrial defect synthesis and shows that its strongest value lies in strengthening scarce real datasets rather than substituting for them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The pipeline mixes VLM-prompted LoRA diffusion and filtering to produce synthetics that, when added to real data, give modest detector gains on ball screws and phone screens, but without volume-matched controls it's unclear if quality or just extra samples drive the result.

read the letter

The main takeaway is that this end-to-end setup for synthetic industrial defects can supplement scarce real data and produce small improvements in selected training regimes, while pure synthetics fall short. They run the full chain on BSData pitting defects and check transfer on MSD, using YOLOv26, YOLOX, and LW-DETR as downstream models. The breakdown of prompt construction, LoRA choice, and DreamSim/CLIPScore filtering is the part that actually adds value beyond a generic augmentation story. They are clear that the synthetics are not a replacement and report the expected pattern of mixed data helping modestly. That honesty and the cross-domain check are the strongest parts. The engineering is solid for a practical domain where annotation is expensive. The soft spot is the missing control for training volume. Adding filtered synthetics increases sample count, so the gains could come from quantity or generic augmentation effects rather than the specific realism or defect distribution the pipeline claims to deliver. A baseline that matches total size with real-data duplication or standard flips would isolate whether the generative steps matter. The abstract flags results in 'selected' regimes, which raises the same question about how the filtering step selects for usefulness versus just keeping more data. This is for engineers who already have some real defect images and need a concrete recipe to stretch them, not for readers hunting new algorithms. The empirical grounding and two-dataset test make it worth a serious referee's time, even if the novelty is in the assembly. Send it to review and ask specifically for volume controls and the exact data splits used in the mixed regimes.

Referee Report

1 major / 0 minor

Summary. The paper presents SynSur, an end-to-end pipeline for synthetic industrial surface defect generation that integrates VLM-based prompt construction, LoRA-adapted diffusion models, mask-guided inpainting, automatic label derivation, and filtering via DreamSim and CLIPScore. It evaluates the pipeline on the BSData dataset of pitting defects on ball screw drives using detectors including YOLOv26, YOLOX, and LW-DETR, and tests cross-domain transfer on a subset of the MSD mobile phone screen defect dataset. Key findings are that synthetic-only training fails to replace real data, while mixing synthetics with real data preserves performance and yields modest gains in selected BSData regimes; the overall pipeline structure transfers to MSD but requires domain-specific adaptation and annotation quality control. The work also analyzes pipeline stages such as prompt construction, LoRA selection, and sample filtering to identify realistic and useful synthetics.

Significance. If the central empirical claims hold after controls, the paper offers a practical, analyzed pipeline for augmenting scarce labeled defect data in industrial inspection tasks, where data collection is costly. It provides concrete multi-detector evaluations on two datasets demonstrating the pattern that synthetics supplement rather than substitute real data, plus a transfer study highlighting domain adaptation needs. Credit is due for the end-to-end assessment, stage-wise analysis of the generative components, and reproducible-style empirical setup with multiple detectors. The result would be useful for practitioners facing data scarcity but is not a fundamental theoretical advance.

major comments (1)

[Experimental evaluation / BSData results] The experimental evaluation (as summarized in the abstract and described in the results) does not control for total training set size when reporting gains from real + synthetic mixtures on BSData. Adding filtered synthetics necessarily increases the effective sample count relative to real-only baselines, so the modest gains in selected regimes could arise from data quantity, generic augmentation effects, or the specific defect distribution and realism produced by the VLM-LoRA-inpainting-DreamSim/CLIPScore pipeline. A volume-matched control (e.g., real-data duplication or standard augmentations to equal total size) is needed to isolate the contribution of the generative components; without it, the claim that the synthetics are distributionally complementary remains under-supported.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The major comment on experimental controls is addressed point-by-point below. We agree that additional controls are warranted and will revise the manuscript to incorporate them.

read point-by-point responses

Referee: The experimental evaluation (as summarized in the abstract and described in the results) does not control for total training set size when reporting gains from real + synthetic mixtures on BSData. Adding filtered synthetics necessarily increases the effective sample count relative to real-only baselines, so the modest gains in selected regimes could arise from data quantity, generic augmentation effects, or the specific defect distribution and realism produced by the VLM-LoRA-inpainting-DreamSim/CLIPScore pipeline. A volume-matched control (e.g., real-data duplication or standard augmentations to equal total size) is needed to isolate the contribution of the generative components; without it, the claim that the synthetics are distributionally complementary remains under-supported.

Authors: We acknowledge that this is a valid concern and that the current results do not fully isolate the contribution of the generative pipeline from simple increases in training set size. The modest gains observed when mixing real and synthetic data on BSData could indeed partly stem from data quantity rather than the specific realism or distributional properties of the SynSur-generated defects. In the revised manuscript, we will add volume-matched control experiments. These will augment the real-only baselines using standard techniques (e.g., random flips, rotations, scaling, and color jitter) or sample duplication to equalize total training set sizes with the real + synthetic mixtures. Performance will be re-reported for YOLOv26, YOLOX, and LW-DETR on BSData, allowing direct comparison to determine whether the synthetics provide complementary value beyond quantity. We believe this strengthens the empirical support for our claims without altering the core findings. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline evaluation is self-contained with no circular reductions

full rationale

The paper describes a generative pipeline (VLM prompts, LoRA-adapted diffusion, mask inpainting, DreamSim/CLIPScore filtering) and reports downstream detector performance on external real datasets (BSData pitting defects and MSD subset). All performance claims rest on experimental comparisons to held-out real data rather than any internal equations, fitted parameters renamed as predictions, or self-citation chains. No load-bearing step reduces a claimed result to a quantity defined by the paper's own inputs or prior self-citations; the work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents exhaustive enumeration; the pipeline implicitly relies on standard assumptions that diffusion models can be domain-adapted to produce distributionally useful defect images and that automatic filtering metrics correlate with downstream utility.

pith-pipeline@v0.9.0 · 5587 in / 1122 out tokens · 36042 ms · 2026-05-07T13:33:39.099585+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 3 canonical work pages · 2 internal anchors

[1]

AnomalyControl: Few-shot anomaly generation by ControlNet inpainting

Musawar Ali, Nicola Fioraio, Samuele Salti, and Luigi Di Stefano. AnomalyControl: Few-shot anomaly generation by ControlNet inpainting. 12:192903–192914
[2]

Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, and David J. Fleet. Synthetic data from diffusion models improves imagenet classification
[3]

A comprehensive survey on machine learning driven material defect detection, 2025

Jun Bai, Di Wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, and Ji Zhang. A comprehensive survey on machine learning driven material defect detection, 2025. Accessed: 2026-01-26

2025
[4]

Mvtec ad — a comprehensive real-world dataset for unsupervised anomaly detection

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad — a comprehensive real-world dataset for unsupervised anomaly detection. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9584–9592, 2019

2019
[5]

Black Forest Labs. FLUX
[6]

YOLOv4: Optimal Speed and Accuracy of Object Detection

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection.ArXiv, abs/2004.10934, 2020. 10 APREPRINT- APRIL30, 2026

work page internal anchor Pith review arXiv 2004
[7]

Sam 3: Segment anything with concepts, 2025

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

2025
[8]

Xinlei Chen, Hao Fang, Tsung-yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers.ArXiv, abs/2005.12872, 2020

work page arXiv 2005
[9]

LW-DETR: A transformer replacement to YOLO for real-time detection

Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, and Jingdong Wang. LW-DETR: A transformer replacement to YOLO for real-time detection
[10]

Instructblip: towards general-purpose vision-language models with instruction tuning

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. Instructblip: towards general-purpose vision-language models with instruction tuning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

2023
[11]

Blenderproc: Reducing the reality gap with photorealistic rendering

Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Dmitry Olefir, Tomas Hodan, Youssef Zidan, Mohamad Elbadrawy, Markus Knauer, Harinandan Katam, and Ahsan Lodhi. Blenderproc: Reducing the reality gap with photorealistic rendering. In16th Robotics: Science and Systems, RSS 2020, Workshops, Juli 2020. Video presentation: https://www.youtube.com...

2020
[12]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. InProceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

2021
[13]

Review of surface-defect detection methods for industrial products based on machine vision.IEEE Access, 13:90668–90697, May 2025

Wei Fang, Mengnan Wang, Jiadong Sun, Deji Chen, and Pei Shi. Review of surface-defect detection methods for industrial products based on machine vision.IEEE Access, 13:90668–90697, May 2025

2025
[14]

Dreamsim: Learning new dimensions of human visual similarity using synthetic data, 2023

Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. Dreamsim: Learning new dimensions of human visual similarity using synthetic data, 2023

2023
[15]

YOLOX: Exceeding YOLO series in 2021

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. YOLOX: Exceeding YOLO series in 2021

2021
[16]

CLIPScore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning
[17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc

2020
[18]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Wang Lu, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Wang Lu, and Weizhu Chen. LoRA: Low-rank adaptation of large language models
[19]

Saksham Jain, Gautam Seth, Arpit Paruthi, Umang Soni, and G. Kumar. Synthetic data augmentation for surface defect detection and classification using deep learning. 33
[20]

A survey of surface defect detection of industrial products based on a small number of labeled data

Qifan Jin and Li Chen. A survey of surface defect detection of industrial products based on a small number of labeled data
[21]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

2025
[22]

Synthetic data generation for surface defect detection

Déborah Lebert, Jérémy Plouzeau, Jean-Philippe Farrugia, Florence Danglade, and Frédéric Merienne. Synthetic data generation for surface defect detection. InExtended Reality, pages 198–208. Springer Nature Switzerland
[23]

Cutpaste: Self-supervised learning for anomaly detection and localization, 2021

Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. Cutpaste: Self-supervised learning for anomaly detection and localization, 2021

2021
[24]

Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

2023
[25]

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014

work page internal anchor Pith review arXiv 2014
[26]

Visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023. 11 APREPRINT- APRIL30, 2026

2023
[27]

Defectgan: Synthetic data generation for emu defects detection with limited data

Scarlett Liu, Hai Ni, Chao Li, Yukang Zou, and Yiping Luo. Defectgan: Synthetic data generation for emu defects detection with limited data. 24(11):17638–17652
[28]

Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, and Yi Liu. Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

2024
[29]

Do we need all the synthetic data? targeted synthetic image augmentation via diffusion models

Dang Nguyen, Jiping Li, Jinghao Zheng, and Baharan Mirzasoleiman. Do we need all the synthetic data? targeted synthetic image augmentation via diffusion models
[30]

Learning transferable visual models from natural language supervision, 2021

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021

2021
[31]

You only look once: Unified, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016

2016
[32]

Yolov3: An incremental improvement, 2018

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement, 2018

2018
[33]

Rf-detr: Neural architecture search for real-time detection transformers, 2025

Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025

2025
[34]

High-resolution image synthesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2022

2022
[35]

Towards total recall in industrial anomaly detection.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14298–14308, 2021

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Scholkopf, Thomas Brox, and Peter Gehler. Towards total recall in industrial anomaly detection.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14298–14308, 2021

2022
[36]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023

2023
[37]

YOLO26: Key architectural enhancements and performance benchmarking for real-time object detection

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. YOLO26: Key architectural enhancements and performance benchmarking for real-time object detection
[38]

Industrial machine tool component surface defect dataset

Tobias Schlagenhauf and Magnus Landwehr. Industrial machine tool component surface defect dataset. 39
[39]

Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion, 2023

Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, and Clinton Fookes. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion, 2023

2023
[40]

6d strawberry pose estimation: Real-time and edge ai solutions using purely synthetic training data, 2025

Saptarshi Neil Sinha, Julius Kühn, Mika Silvan Goschke, and Michael Weinmann. 6d strawberry pose estimation: Real-time and edge ai solutions using purely synthetic training data, 2025

2025
[41]

DefectFill: Realistic defect generation with inpainting diffusion model for visual inspection

Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, and Sungroh Yoon. DefectFill: Realistic defect generation with inpainting diffusion model for visual inspection
[42]

Samiul Alam, Namratha Rao, and Kazi Muhammad Asif Ashrafi

Marjana Tahmid, Md. Samiul Alam, Namratha Rao, and Kazi Muhammad Asif Ashrafi. Image-to-image translation with conditional adversarial networks. In2023 IEEE 9th International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), pages 1–5, 2023

2023
[43]

DefectGen: Few-shot defect image generation using stable diffusion for steel surface analysis

Adnan Md Tayeb, Hope Leticia Nakayiza, Heejae Shin, Seungmin Lee, Chaesoo Lee, YeongHun Lee, Dong-Seong Kim, and Jae-Min Lee. DefectGen: Few-shot defect image generation using stable diffusion for steel surface analysis. In2024 15th International Conference on Information and Communication Technology Convergence (ICTC), pages 2087–2092

2087
[44]

Defectdiffusion: A generative diffusion model for robust data augmentation in industrial defect detection

Adnan Md Tayeb, Hope Leticia Nakayiza, Heejae Shin, Seungmin Lee, Jae-Min Lee, and Dong-Seong Kim. Defectdiffusion: A generative diffusion model for robust data augmentation in industrial defect detection. In2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 66–71
[45]

Domain randomiza- tion for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomiza- tion for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017

2017
[46]

Training deep networks with synthetic data: Bridging the reality gap by domain randomization

Jonathan Tremblay, Aayush Prakash, David Acuna, Mark Brophy, Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, and Stan Birchfield. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1082–10828, 2018

2018
[47]

Falling things: A synthetic dataset for 3d object detection and pose estimation, 2018

Jonathan Tremblay, Thang To, and Stan Birchfield. Falling things: A synthetic dataset for 3d object detection and pose estimation, 2018. 12 APREPRINT- APRIL30, 2026

2018
[48]

Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024

2024
[49]

A systematic review and evaluation of synthetic simulated data generation strategies for deep learning applications in construction

Liqun Xu, Hexu Liu, Bo Xiao, Xiaowei Luo, Dharmaraj Veeramani, and Zhenhua Zhu. A systematic review and evaluation of synthetic simulated data generation strategies for deep learning applications in construction. 62
[50]

Qwen3 technical report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
[51]

DrÆm – a discriminatively trained reconstruction embedding for surface anomaly detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj. DrÆm – a discriminatively trained reconstruction embedding for surface anomaly detection. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8310–8319, 2021

2021
[52]

FDSNeT: An accurate real-time surface defect segmentation network

Jian Zhang, Runwei Ding, Miaoju Ban, and Tianyu Guo. FDSNeT: An accurate real-time surface defect segmentation network. InICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3803–3807

2022
[53]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023

2023
[54]

Detrs beat yolos on real-time object detection, 2023

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. Detrs beat yolos on real-time object detection, 2023

2023
[55]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In2017 IEEE International Conference on Computer Vision (ICCV), pages 2242–2251, 2017. 13

2017