pith. sign in

arxiv: 2604.19240 · v1 · submitted 2026-04-21 · 💻 cs.AI

Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network

Pith reviewed 2026-05-10 02:47 UTC · model grok-4.3

classification 💻 cs.AI
keywords industrial defect detectiondiffusion modelssynthetic data generationteacher-student networkunsupervised anomaly detectionpixel-level localization
0
0 comments X

The pith

A diffusion model trained only on normal samples generates realistic defects to train an asymmetric teacher-student network for unsupervised industrial surface defect detection and localization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the lack of defect samples for training detection systems in manufacturing by creating artificial defects. It does this by training a denoising diffusion model exclusively on defect-free images and then adding structured noise to produce varied, realistic-looking defects complete with location labels. These synthetic examples are used to train a two-part network where one part stays focused on normal patterns and the other learns to spot and mark deviations. The result is a system that can both classify images as defective or not and highlight the exact defective areas. Such an approach matters because it removes the need to collect and label rare defect occurrences, potentially making automated quality control more practical across industries.

Core claim

Training a denoising diffusion probabilistic model solely on normal samples and using it to generate defects through constant-variance Gaussian perturbations and Perlin noise-based masks, then employing these in an asymmetric teacher-student network with cosine similarity and pixel-wise losses, allows the model to achieve 98.4% image-level AUROC and 98.3% pixel-level AUROC for unsupervised defect detection and localization on the MVTecAD dataset.

What carries the argument

Asymmetric teacher-student network in which the teacher extracts stable normal feature representations while the student reconstructs normal patterns and amplifies anomaly discrepancies, trained on data generated by a diffusion model.

Load-bearing premise

The defects created by the diffusion model using Gaussian noise and Perlin masks are sufficiently similar to real defects to train a detector that works on actual industrial surfaces.

What would settle it

A comparison where the model performs poorly on a set of real defects whose visual properties do not match those of the generated ones would indicate the generation step is insufficient.

Figures

Figures reproduced from arXiv: 2604.19240 by Guangcan Liu, Runlin Zhou, Shuo Feng, Yuyang Li.

Figure 1
Figure 1. Figure 1: The overall pipeline of the local defect synthesis method based on Perlin noise mask. 3.2 Dataset Reconstruction Scheme for Downstream Denoising and Defect Detection Tasks To better adapt to downstream denoising and defect detection tasks, this pa￾per proposes a structured triplet-based sample construction strategy. During the data organization and loading stage, the indexing mechanism of raw samples is re… view at source ↗
Figure 2
Figure 2. Figure 2: Multi-Task Jointly Driven Training Process [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-Task Jointly Driven Inference Process. 3.3.3 Loss Function The core training objective of our asymmetric dual-stream framework is to enforce the student decoder to produce feature representations that closely match those extracted by the frozen teacher network on clean defect￾free images. Leveraging cross-layer spatial topology alignment and channel-wise L2 normalization, the spatial distance between… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of defect localization results on the MVTec AD dataset. 4.5 Ablation Study To verify the influence of each core module on detection performance, this chap￾ter conducts an ablation study based on the MVTec AD dataset. Key compo￾nents including the decoder, cosine loss, segmentation head, and loss function are removed or replaced respectively to quantify the necessity of each structure. Experim… view at source ↗
read the original abstract

Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions, and difficulties in accurately localizing subtle defects under complex backgrounds. To address these challenges, this paper proposes an unsupervised defect detection method that integrates a Denoising Diffusion Probabilistic Model (DDPM) with an asymmetric teacher-student architecture. First, at the data level, the DDPM is trained solely on normal samples. By introducing constant-variance Gaussian perturbations and Perlin noise-based masks, high-fidelity and physically consistent defect samples along with pixel-level annotations are generated, effectively alleviating the data scarcity problem. Second, at the model level, an asymmetric dual-stream network is constructed. The teacher network provides stable representations of normal features, while the student network reconstructs normal patterns and amplifies discrepancies between normal and anomalous regions. Finally, a joint optimization strategy combining cosine similarity loss and pixel-wise segmentation supervision is adopted to achieve precise localization of subtle defects. Experimental results on the MVTecAD dataset show that the proposed method achieves 98.4\% image-level AUROC and 98.3\% pixel-level AUROC, significantly outperforming existing unsupervised and mainstream deep learning methods. The proposed approach does not require large amounts of real defect samples and enables accurate and robust industrial defect detection and localization. \keywords{Industrial defect detection \and diffusion models \and data generation \and teacher-student architecture \and pixel-level localization}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an unsupervised industrial surface defect detection method that trains a DDPM solely on normal samples, then synthesizes defects via constant-variance Gaussian perturbations and Perlin noise masks to produce both images and pixel-level annotations. These synthetic data train an asymmetric teacher-student network in which the teacher supplies stable normal representations while the student reconstructs normals and amplifies anomalies; joint optimization uses cosine similarity plus pixel-wise segmentation supervision. On MVTecAD the method reports 98.4% image-level AUROC and 98.3% pixel-level AUROC, outperforming prior unsupervised and supervised baselines without requiring real defect samples.

Significance. If the synthetic defects prove distributionally faithful to real industrial anomalies, the approach would meaningfully alleviate data scarcity and long-tailed distributions that currently limit supervised defect detectors. The combination of diffusion-based synthesis with an asymmetric distillation architecture is a coherent and potentially reusable design pattern for other anomaly-detection domains.

major comments (3)
  1. [defect synthesis procedure (Section 3.1)] The central performance claim rests on the assertion that DDPM-generated defects (constant-variance Gaussian + Perlin masks) are high-fidelity and physically consistent with real MVTecAD anomalies, yet the manuscript supplies no distributional metrics (FID, MMD, or expert visual scoring) comparing synthetic versus real defect images, nor any ablation replacing the Perlin-masked outputs with random noise or simpler masks. Without this evidence it is impossible to rule out that the student network is merely learning Perlin-specific artifacts rather than genuine defect cues.
  2. [Experiments and Results (Section 4)] The experimental section reports single-point AUROC figures (98.4% / 98.3%) with no error bars, no repeated runs with different random seeds, and no statistical significance tests against the reproduced baselines. This makes the claim of “significant outperformance” difficult to evaluate.
  3. [Implementation details and ablation studies (Section 4.2)] No ablation is presented on the free parameters listed in the method (Gaussian perturbation variance, Perlin noise mask parameters, or the weighting between cosine similarity and segmentation losses). Consequently it is unclear whether the reported numbers are robust or the result of post-hoc tuning on the test set.
minor comments (2)
  1. [Abstract and Introduction] The abstract and introduction repeatedly use the phrase “high-fidelity and physically consistent” without defining the criteria; a short paragraph clarifying the intended meaning would improve readability.
  2. [Experimental setup] Baseline implementation details (exact architectures, training epochs, data augmentations) are referenced only by citation; a brief table summarizing the reproduced settings would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of combining diffusion-based defect synthesis with an asymmetric teacher-student architecture. We agree that stronger empirical validation is required for the synthetic data fidelity, experimental reproducibility, and hyperparameter robustness. We address each major comment below and will revise the manuscript to incorporate the requested evidence and analyses.

read point-by-point responses
  1. Referee: [defect synthesis procedure (Section 3.1)] The central performance claim rests on the assertion that DDPM-generated defects (constant-variance Gaussian + Perlin masks) are high-fidelity and physically consistent with real MVTecAD anomalies, yet the manuscript supplies no distributional metrics (FID, MMD, or expert visual scoring) comparing synthetic versus real defect images, nor any ablation replacing the Perlin-masked outputs with random noise or simpler masks. Without this evidence it is impossible to rule out that the student network is merely learning Perlin-specific artifacts rather than genuine defect cues.

    Authors: We agree that quantitative validation of the synthetic defects is essential to support our claims of high fidelity. In the revised manuscript we will add FID and MMD scores computed between the DDPM-generated defect images and the real defect samples in MVTecAD. We will also include side-by-side visual comparisons and, to the extent possible, a small-scale expert visual scoring study. In addition, we will insert an ablation in Section 4.2 that replaces the Perlin masks with random Gaussian noise masks (and simpler alternatives) while keeping all other components fixed, thereby demonstrating that performance gains arise from realistic defect cues rather than mask-specific artifacts. revision: yes

  2. Referee: [Experiments and Results (Section 4)] The experimental section reports single-point AUROC figures (98.4% / 98.3%) with no error bars, no repeated runs with different random seeds, and no statistical significance tests against the reproduced baselines. This makes the claim of “significant outperformance” difficult to evaluate.

    Authors: We acknowledge that single-run point estimates without statistical support weaken the evaluation. In the revision we will repeat all experiments (including baseline reproductions) with at least five different random seeds, report mean image-level and pixel-level AUROC together with standard deviations, and include paired t-tests (or equivalent non-parametric tests) to establish statistical significance of the improvements over the reproduced baselines. revision: yes

  3. Referee: [Implementation details and ablation studies (Section 4.2)] No ablation is presented on the free parameters listed in the method (Gaussian perturbation variance, Perlin noise mask parameters, or the weighting between cosine similarity and segmentation losses). Consequently it is unclear whether the reported numbers are robust or the result of post-hoc tuning on the test set.

    Authors: We agree that sensitivity analysis of the listed hyperparameters is necessary to demonstrate robustness. We will expand Section 4.2 with systematic ablations on (i) the constant-variance Gaussian perturbation level, (ii) Perlin noise parameters such as scale and octaves, and (iii) the relative weighting between the cosine-similarity and pixel-wise segmentation losses. These studies will be performed on a held-out validation split to avoid test-set tuning and will be summarized with performance curves or tables. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with independent benchmark evaluation

full rationale

The paper describes an empirical pipeline: DDPM trained only on normal samples, followed by synthetic defect generation via constant-variance Gaussian noise plus Perlin masks, then training of an asymmetric teacher-student network with cosine similarity and segmentation losses. Reported performance consists of AUROC numbers on the public MVTecAD benchmark. No equations, fitted parameters, or self-citations are invoked in a way that reduces the central claims to the inputs by construction. The derivation chain is self-contained and externally falsifiable via the stated dataset and metrics.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unverified assumption that synthetic defects are realistic enough for training, plus multiple unspecified hyperparameters in the diffusion process and network training; no new physical entities are postulated.

free parameters (3)
  • Gaussian perturbation variance
    Constant-variance noise added during defect generation; specific value chosen but not reported in abstract.
  • Perlin noise mask parameters
    Settings controlling mask shape and scale for defect localization; chosen to produce physically consistent samples.
  • Loss weighting between cosine similarity and segmentation supervision
    Balance factor in joint optimization; not specified.
axioms (2)
  • domain assumption DDPM trained only on normal samples can produce high-fidelity, physically consistent defect images when combined with Gaussian perturbations and Perlin masks
    Invoked in the data-generation stage to justify synthetic sample quality.
  • domain assumption The teacher network provides stable normal representations that the student can reliably reconstruct except at defect locations
    Core premise of the asymmetric architecture.

pith-pipeline@v0.9.0 · 5552 in / 1479 out tokens · 129764 ms · 2026-05-10T02:47:42.154381+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    Image-based surface defect detection using deep learning: A review[J]

    Bhatt P M, Malhan R K, Rajendran P, et al. Image-based surface defect detection using deep learning: A review[J]. Journal of Computing and Information Science in Engineering, 2021, 21(4): 040801

  2. [2]

    Defect image sample generation with GAN for improving defect recognition[J]

    Niu S, Wang Y, Wang F, et al. Defect image sample generation with GAN for improving defect recognition[J]. IEEE Transactions on Automation Science and Engineering, 2020, 18(3): 1071–1082

  3. [3]

    Few-shot defect image generation via defect-aware feature manipulation[C]

    Duan Y, Liu J, Wang Z, et al. Few-shot defect image generation via defect-aware feature manipulation[C]. Proceedings of the AAAI Conference on Artificial Intel- ligence, 2023: 1346–1354

  4. [4]

    Improved denoising diffusion probabilistic models[C]

    Nichol A Q, Dhariwal P. Improved denoising diffusion probabilistic models[C]. International Conference on Machine Learning, 2021: 8162–8171

  5. [5]

    On diffusion modeling for anomaly detection[C]

    Livernoche V, K¨ ohler L, Eisenbacher M, et al. On diffusion modeling for anomaly detection[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 2032–2041

  6. [6]

    High-resolution image synthesis with latent diffusion models[C]

    Rombach R, Blattmann A, Lorenz R, et al. High-resolution image synthesis with latent diffusion models[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10684–10695

  7. [7]

    Denoising diffusion implicit models[C]

    Song J, Meng C, Ermon S. Denoising diffusion implicit models[C]. International Conference on Learning Representations, 2020

  8. [8]

    Score-based generative modeling through stochastic differential equations[C]

    Song Y, Sohl-Dickstein J, Kingma D P, et al. Score-based generative modeling through stochastic differential equations[C]. International Conference on Learning Representations, 2020

  9. [9]

    Denoising diffusion probabilistic models[C]

    Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[C]. Advances in Neural Information Processing Systems, 2020, 33: 6840–6851

  10. [10]

    Diffusion models beat GANs on image synthesis[C]

    Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis[C]. Ad- vances in Neural Information Processing Systems, 2021, 34: 8780–8794

  11. [11]

    A computational approach to edge detection[J]

    Canny J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6): 679–698

  12. [12]

    Multiresolution gray-scale and rotation in- variant texture classification with local binary patterns[J]

    Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation in- variant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987

  13. [13]

    U-net: Convolutional networks for biomedical image segmentation[C]

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234–241

  14. [14]

    Faster r-cnn: Towards real-time object detection with region proposal networks[C]

    Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Advances in Neural Information Processing Sys- tems, 2015, 28: 91–99

  15. [15]

    Patchcore: Towards total recall in industrial anomaly detection[C]

    Roth K, Pemula L, Zepeda J, et al. Patchcore: Towards total recall in industrial anomaly detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 14009–14019

  16. [16]

    Student-teacher feature pyramid matching for anomaly detection[C]

    Wang G, Han S, Ding E, et al. Student-teacher feature pyramid matching for anomaly detection[C]. Proceedings of the British Machine Vision Conference, 2021: 1–14. 14 S. Feng et al

  17. [17]

    Autoaugment: Learning augmentation policies from data[C]

    Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation policies from data[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 113–123

  18. [18]

    Randaugment: Practical automated data aug- mentation with a reduced search space[C]

    Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data aug- mentation with a reduced search space[C]. Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition Workshops, 2020: 702–703

  19. [19]

    Defect-gan: High-fidelity defect synthesis for automated defect inspection[C]

    Zhang G, Cui K, Hung T Y, et al. Defect-gan: High-fidelity defect synthesis for automated defect inspection[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 2515–2524

  20. [20]

    Auto-Encoding Variational Bayes

    Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013

  21. [21]

    Cutpaste: Self-supervised learning for anomaly detec- tion and localization[C]

    Li C L, Sohn K, Yoon J, et al. Cutpaste: Self-supervised learning for anomaly detec- tion and localization[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9664–9674

  22. [22]

    Draem-a discriminatively trained reconstruc- tion embedding for surface anomaly detection[C]

    Zavrtanik V, Kristan M, Skocaj D. Draem-a discriminatively trained reconstruc- tion embedding for surface anomaly detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 8330–8339

  23. [23]

    Diffusionad: Denoising diffusion for anomaly de- tection[C]

    Zhang H, Wang Z, Wu Z, et al. Diffusionad: Denoising diffusion for anomaly de- tection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 843–852

  24. [24]

    Destseg: Segmentation guided denoising student-teacher for anomaly detection[C]

    Zhang X, Li S, Li X, et al. Destseg: Segmentation guided denoising student-teacher for anomaly detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 3914–3923

  25. [25]

    Winclip: Zero-/few-shot anomaly classification and segmentation[C]

    Jeong J, Kim S, Seo D, et al. Winclip: Zero-/few-shot anomaly classification and segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 19613–19622

  26. [26]

    MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection[C]

    Bergmann P, Fauser M, Sattlegger D, et al. MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9592– 9600

  27. [27]

    Uninformed students: Student- teacher anomaly detection with discriminative latent features[C]

    Bergmann P, Fauser M, Sattlegger D, et al. Uninformed students: Student- teacher anomaly detection with discriminative latent features[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8771–8780

  28. [28]

    The per-region overlap (PRO) score: A fair evaluation metric for anomaly localization[J]

    Schneider T, Bergmann P, Steger C. The per-region overlap (PRO) score: A fair evaluation metric for anomaly localization[J]. arXiv preprint arXiv:2009.14067, 2020

  29. [29]

    SimpleNet: A simple network for image anomaly detection and localization[C]

    Liu Z, Wang Y, Han Y, et al. SimpleNet: A simple network for image anomaly detection and localization[C]. Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023: 20402–20411

  30. [30]

    RealNet: A feature selection network with realistic syn- thetic anomaly for anomaly detection[C]

    Zhang X, Xu M, Zhou X. RealNet: A feature selection network with realistic syn- thetic anomaly for anomaly detection[C]. Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024: 23678–23687

  31. [31]

    CFLOW-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows[C]

    Gudovskiy D, Ishizaka S, Kozuka K. CFLOW-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows[C]. Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 1434–1442

  32. [32]

    PyramidFlow: High-resolution defect contrastive lo- calization using pyramid normalizing flow[C]

    Lei J, Hu X, Wang Y, et al. PyramidFlow: High-resolution defect contrastive lo- calization using pyramid normalizing flow[C]. Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023: 14143–14152. Title Suppressed Due to Excessive Length 15

  33. [33]

    DiAD: A diffusion-based framework for multi-class anomaly detection[J]

    He H, Zhang J, Chen H, et al. DiAD: A diffusion-based framework for multi-class anomaly detection[J]. arXiv preprint arXiv:2312.06607, 2023

  34. [34]

    UTRAD: Anomaly detection and localization with U-Transformer[J]

    Chen L, You Z, Zhang N, et al. UTRAD: Anomaly detection and localization with U-Transformer[J]. Neural Networks, 2022, 147: 53–62