Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network
Pith reviewed 2026-05-10 02:47 UTC · model grok-4.3
The pith
A diffusion model trained only on normal samples generates realistic defects to train an asymmetric teacher-student network for unsupervised industrial surface defect detection and localization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training a denoising diffusion probabilistic model solely on normal samples and using it to generate defects through constant-variance Gaussian perturbations and Perlin noise-based masks, then employing these in an asymmetric teacher-student network with cosine similarity and pixel-wise losses, allows the model to achieve 98.4% image-level AUROC and 98.3% pixel-level AUROC for unsupervised defect detection and localization on the MVTecAD dataset.
What carries the argument
Asymmetric teacher-student network in which the teacher extracts stable normal feature representations while the student reconstructs normal patterns and amplifies anomaly discrepancies, trained on data generated by a diffusion model.
Load-bearing premise
The defects created by the diffusion model using Gaussian noise and Perlin masks are sufficiently similar to real defects to train a detector that works on actual industrial surfaces.
What would settle it
A comparison where the model performs poorly on a set of real defects whose visual properties do not match those of the generated ones would indicate the generation step is insufficient.
Figures
read the original abstract
Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions, and difficulties in accurately localizing subtle defects under complex backgrounds. To address these challenges, this paper proposes an unsupervised defect detection method that integrates a Denoising Diffusion Probabilistic Model (DDPM) with an asymmetric teacher-student architecture. First, at the data level, the DDPM is trained solely on normal samples. By introducing constant-variance Gaussian perturbations and Perlin noise-based masks, high-fidelity and physically consistent defect samples along with pixel-level annotations are generated, effectively alleviating the data scarcity problem. Second, at the model level, an asymmetric dual-stream network is constructed. The teacher network provides stable representations of normal features, while the student network reconstructs normal patterns and amplifies discrepancies between normal and anomalous regions. Finally, a joint optimization strategy combining cosine similarity loss and pixel-wise segmentation supervision is adopted to achieve precise localization of subtle defects. Experimental results on the MVTecAD dataset show that the proposed method achieves 98.4\% image-level AUROC and 98.3\% pixel-level AUROC, significantly outperforming existing unsupervised and mainstream deep learning methods. The proposed approach does not require large amounts of real defect samples and enables accurate and robust industrial defect detection and localization. \keywords{Industrial defect detection \and diffusion models \and data generation \and teacher-student architecture \and pixel-level localization}
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an unsupervised industrial surface defect detection method that trains a DDPM solely on normal samples, then synthesizes defects via constant-variance Gaussian perturbations and Perlin noise masks to produce both images and pixel-level annotations. These synthetic data train an asymmetric teacher-student network in which the teacher supplies stable normal representations while the student reconstructs normals and amplifies anomalies; joint optimization uses cosine similarity plus pixel-wise segmentation supervision. On MVTecAD the method reports 98.4% image-level AUROC and 98.3% pixel-level AUROC, outperforming prior unsupervised and supervised baselines without requiring real defect samples.
Significance. If the synthetic defects prove distributionally faithful to real industrial anomalies, the approach would meaningfully alleviate data scarcity and long-tailed distributions that currently limit supervised defect detectors. The combination of diffusion-based synthesis with an asymmetric distillation architecture is a coherent and potentially reusable design pattern for other anomaly-detection domains.
major comments (3)
- [defect synthesis procedure (Section 3.1)] The central performance claim rests on the assertion that DDPM-generated defects (constant-variance Gaussian + Perlin masks) are high-fidelity and physically consistent with real MVTecAD anomalies, yet the manuscript supplies no distributional metrics (FID, MMD, or expert visual scoring) comparing synthetic versus real defect images, nor any ablation replacing the Perlin-masked outputs with random noise or simpler masks. Without this evidence it is impossible to rule out that the student network is merely learning Perlin-specific artifacts rather than genuine defect cues.
- [Experiments and Results (Section 4)] The experimental section reports single-point AUROC figures (98.4% / 98.3%) with no error bars, no repeated runs with different random seeds, and no statistical significance tests against the reproduced baselines. This makes the claim of “significant outperformance” difficult to evaluate.
- [Implementation details and ablation studies (Section 4.2)] No ablation is presented on the free parameters listed in the method (Gaussian perturbation variance, Perlin noise mask parameters, or the weighting between cosine similarity and segmentation losses). Consequently it is unclear whether the reported numbers are robust or the result of post-hoc tuning on the test set.
minor comments (2)
- [Abstract and Introduction] The abstract and introduction repeatedly use the phrase “high-fidelity and physically consistent” without defining the criteria; a short paragraph clarifying the intended meaning would improve readability.
- [Experimental setup] Baseline implementation details (exact architectures, training epochs, data augmentations) are referenced only by citation; a brief table summarizing the reproduced settings would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of combining diffusion-based defect synthesis with an asymmetric teacher-student architecture. We agree that stronger empirical validation is required for the synthetic data fidelity, experimental reproducibility, and hyperparameter robustness. We address each major comment below and will revise the manuscript to incorporate the requested evidence and analyses.
read point-by-point responses
-
Referee: [defect synthesis procedure (Section 3.1)] The central performance claim rests on the assertion that DDPM-generated defects (constant-variance Gaussian + Perlin masks) are high-fidelity and physically consistent with real MVTecAD anomalies, yet the manuscript supplies no distributional metrics (FID, MMD, or expert visual scoring) comparing synthetic versus real defect images, nor any ablation replacing the Perlin-masked outputs with random noise or simpler masks. Without this evidence it is impossible to rule out that the student network is merely learning Perlin-specific artifacts rather than genuine defect cues.
Authors: We agree that quantitative validation of the synthetic defects is essential to support our claims of high fidelity. In the revised manuscript we will add FID and MMD scores computed between the DDPM-generated defect images and the real defect samples in MVTecAD. We will also include side-by-side visual comparisons and, to the extent possible, a small-scale expert visual scoring study. In addition, we will insert an ablation in Section 4.2 that replaces the Perlin masks with random Gaussian noise masks (and simpler alternatives) while keeping all other components fixed, thereby demonstrating that performance gains arise from realistic defect cues rather than mask-specific artifacts. revision: yes
-
Referee: [Experiments and Results (Section 4)] The experimental section reports single-point AUROC figures (98.4% / 98.3%) with no error bars, no repeated runs with different random seeds, and no statistical significance tests against the reproduced baselines. This makes the claim of “significant outperformance” difficult to evaluate.
Authors: We acknowledge that single-run point estimates without statistical support weaken the evaluation. In the revision we will repeat all experiments (including baseline reproductions) with at least five different random seeds, report mean image-level and pixel-level AUROC together with standard deviations, and include paired t-tests (or equivalent non-parametric tests) to establish statistical significance of the improvements over the reproduced baselines. revision: yes
-
Referee: [Implementation details and ablation studies (Section 4.2)] No ablation is presented on the free parameters listed in the method (Gaussian perturbation variance, Perlin noise mask parameters, or the weighting between cosine similarity and segmentation losses). Consequently it is unclear whether the reported numbers are robust or the result of post-hoc tuning on the test set.
Authors: We agree that sensitivity analysis of the listed hyperparameters is necessary to demonstrate robustness. We will expand Section 4.2 with systematic ablations on (i) the constant-variance Gaussian perturbation level, (ii) Perlin noise parameters such as scale and octaves, and (iii) the relative weighting between the cosine-similarity and pixel-wise segmentation losses. These studies will be performed on a held-out validation split to avoid test-set tuning and will be summarized with performance curves or tables. revision: yes
Circularity Check
No circularity; empirical method with independent benchmark evaluation
full rationale
The paper describes an empirical pipeline: DDPM trained only on normal samples, followed by synthetic defect generation via constant-variance Gaussian noise plus Perlin masks, then training of an asymmetric teacher-student network with cosine similarity and segmentation losses. Reported performance consists of AUROC numbers on the public MVTecAD benchmark. No equations, fitted parameters, or self-citations are invoked in a way that reduces the central claims to the inputs by construction. The derivation chain is self-contained and externally falsifiable via the stated dataset and metrics.
Axiom & Free-Parameter Ledger
free parameters (3)
- Gaussian perturbation variance
- Perlin noise mask parameters
- Loss weighting between cosine similarity and segmentation supervision
axioms (2)
- domain assumption DDPM trained only on normal samples can produce high-fidelity, physically consistent defect images when combined with Gaussian perturbations and Perlin masks
- domain assumption The teacher network provides stable normal representations that the student can reliably reconstruct except at defect locations
Reference graph
Works this paper leans on
-
[1]
Image-based surface defect detection using deep learning: A review[J]
Bhatt P M, Malhan R K, Rajendran P, et al. Image-based surface defect detection using deep learning: A review[J]. Journal of Computing and Information Science in Engineering, 2021, 21(4): 040801
work page 2021
-
[2]
Defect image sample generation with GAN for improving defect recognition[J]
Niu S, Wang Y, Wang F, et al. Defect image sample generation with GAN for improving defect recognition[J]. IEEE Transactions on Automation Science and Engineering, 2020, 18(3): 1071–1082
work page 2020
-
[3]
Few-shot defect image generation via defect-aware feature manipulation[C]
Duan Y, Liu J, Wang Z, et al. Few-shot defect image generation via defect-aware feature manipulation[C]. Proceedings of the AAAI Conference on Artificial Intel- ligence, 2023: 1346–1354
work page 2023
-
[4]
Improved denoising diffusion probabilistic models[C]
Nichol A Q, Dhariwal P. Improved denoising diffusion probabilistic models[C]. International Conference on Machine Learning, 2021: 8162–8171
work page 2021
-
[5]
On diffusion modeling for anomaly detection[C]
Livernoche V, K¨ ohler L, Eisenbacher M, et al. On diffusion modeling for anomaly detection[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 2032–2041
work page 2024
-
[6]
High-resolution image synthesis with latent diffusion models[C]
Rombach R, Blattmann A, Lorenz R, et al. High-resolution image synthesis with latent diffusion models[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10684–10695
work page 2022
-
[7]
Denoising diffusion implicit models[C]
Song J, Meng C, Ermon S. Denoising diffusion implicit models[C]. International Conference on Learning Representations, 2020
work page 2020
-
[8]
Score-based generative modeling through stochastic differential equations[C]
Song Y, Sohl-Dickstein J, Kingma D P, et al. Score-based generative modeling through stochastic differential equations[C]. International Conference on Learning Representations, 2020
work page 2020
-
[9]
Denoising diffusion probabilistic models[C]
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[C]. Advances in Neural Information Processing Systems, 2020, 33: 6840–6851
work page 2020
-
[10]
Diffusion models beat GANs on image synthesis[C]
Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis[C]. Ad- vances in Neural Information Processing Systems, 2021, 34: 8780–8794
work page 2021
-
[11]
A computational approach to edge detection[J]
Canny J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6): 679–698
work page 1986
-
[12]
Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation in- variant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987
work page 2002
-
[13]
U-net: Convolutional networks for biomedical image segmentation[C]
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234–241
work page 2015
-
[14]
Faster r-cnn: Towards real-time object detection with region proposal networks[C]
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Advances in Neural Information Processing Sys- tems, 2015, 28: 91–99
work page 2015
-
[15]
Patchcore: Towards total recall in industrial anomaly detection[C]
Roth K, Pemula L, Zepeda J, et al. Patchcore: Towards total recall in industrial anomaly detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 14009–14019
work page 2021
-
[16]
Student-teacher feature pyramid matching for anomaly detection[C]
Wang G, Han S, Ding E, et al. Student-teacher feature pyramid matching for anomaly detection[C]. Proceedings of the British Machine Vision Conference, 2021: 1–14. 14 S. Feng et al
work page 2021
-
[17]
Autoaugment: Learning augmentation policies from data[C]
Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation policies from data[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 113–123
work page 2019
-
[18]
Randaugment: Practical automated data aug- mentation with a reduced search space[C]
Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data aug- mentation with a reduced search space[C]. Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition Workshops, 2020: 702–703
work page 2020
-
[19]
Defect-gan: High-fidelity defect synthesis for automated defect inspection[C]
Zhang G, Cui K, Hung T Y, et al. Defect-gan: High-fidelity defect synthesis for automated defect inspection[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 2515–2524
work page 2021
-
[20]
Auto-Encoding Variational Bayes
Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[21]
Cutpaste: Self-supervised learning for anomaly detec- tion and localization[C]
Li C L, Sohn K, Yoon J, et al. Cutpaste: Self-supervised learning for anomaly detec- tion and localization[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9664–9674
work page 2021
-
[22]
Draem-a discriminatively trained reconstruc- tion embedding for surface anomaly detection[C]
Zavrtanik V, Kristan M, Skocaj D. Draem-a discriminatively trained reconstruc- tion embedding for surface anomaly detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 8330–8339
work page 2021
-
[23]
Diffusionad: Denoising diffusion for anomaly de- tection[C]
Zhang H, Wang Z, Wu Z, et al. Diffusionad: Denoising diffusion for anomaly de- tection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 843–852
work page 2023
-
[24]
Destseg: Segmentation guided denoising student-teacher for anomaly detection[C]
Zhang X, Li S, Li X, et al. Destseg: Segmentation guided denoising student-teacher for anomaly detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 3914–3923
work page 2023
-
[25]
Winclip: Zero-/few-shot anomaly classification and segmentation[C]
Jeong J, Kim S, Seo D, et al. Winclip: Zero-/few-shot anomaly classification and segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 19613–19622
work page 2023
-
[26]
MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection[C]
Bergmann P, Fauser M, Sattlegger D, et al. MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9592– 9600
work page 2019
-
[27]
Uninformed students: Student- teacher anomaly detection with discriminative latent features[C]
Bergmann P, Fauser M, Sattlegger D, et al. Uninformed students: Student- teacher anomaly detection with discriminative latent features[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8771–8780
work page 2020
-
[28]
The per-region overlap (PRO) score: A fair evaluation metric for anomaly localization[J]
Schneider T, Bergmann P, Steger C. The per-region overlap (PRO) score: A fair evaluation metric for anomaly localization[J]. arXiv preprint arXiv:2009.14067, 2020
-
[29]
SimpleNet: A simple network for image anomaly detection and localization[C]
Liu Z, Wang Y, Han Y, et al. SimpleNet: A simple network for image anomaly detection and localization[C]. Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023: 20402–20411
work page 2023
-
[30]
RealNet: A feature selection network with realistic syn- thetic anomaly for anomaly detection[C]
Zhang X, Xu M, Zhou X. RealNet: A feature selection network with realistic syn- thetic anomaly for anomaly detection[C]. Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024: 23678–23687
work page 2024
-
[31]
Gudovskiy D, Ishizaka S, Kozuka K. CFLOW-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows[C]. Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 1434–1442
work page 2022
-
[32]
PyramidFlow: High-resolution defect contrastive lo- calization using pyramid normalizing flow[C]
Lei J, Hu X, Wang Y, et al. PyramidFlow: High-resolution defect contrastive lo- calization using pyramid normalizing flow[C]. Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023: 14143–14152. Title Suppressed Due to Excessive Length 15
work page 2023
-
[33]
DiAD: A diffusion-based framework for multi-class anomaly detection[J]
He H, Zhang J, Chen H, et al. DiAD: A diffusion-based framework for multi-class anomaly detection[J]. arXiv preprint arXiv:2312.06607, 2023
-
[34]
UTRAD: Anomaly detection and localization with U-Transformer[J]
Chen L, You Z, Zhang N, et al. UTRAD: Anomaly detection and localization with U-Transformer[J]. Neural Networks, 2022, 147: 53–62
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.