pith. sign in

arxiv: 2605.17780 · v1 · pith:UUD4QC5Vnew · submitted 2026-05-18 · 💻 cs.CV

Network Knowledge Prior Guided Learning for Data-Efficient Surface Defect Detection

Pith reviewed 2026-05-20 12:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords surface defect detectionsaliency mapsknowledge-guided lossmulti-task learningmodel interpretabilitydata-efficient trainingregularizationindustrial vision
0
0 comments X

The pith

A knowledge-guided loss enforces consistency with prior saliency maps to improve defect detection accuracy and interpretability without added inference cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-phase training process for surface defect detection. First a primary network is trained and its saliency maps are extracted as prior knowledge. Then a multi-task framework trains the main classification task while an auxiliary task uses a dedicated loss to keep the final model's saliency maps consistent with those priors. This consistency term functions as a regularizer that promotes robust features. A sympathetic reader would care because the approach tackles both the data demands and the trustworthiness problems that limit deep learning in real industrial inspection settings.

Core claim

The authors claim that training a model under an auxiliary consistency constraint between its own saliency maps and those produced by a separately trained primary network steers the model toward more robust feature representations, yielding higher accuracy and average precision on public defect datasets together with more concentrated and human-intelligible saliency maps, all without any increase in inference cost.

What carries the argument

The knowledge-guided loss term inside a multi-task learning setup that penalizes divergence between saliency maps of a final model and those generated by a primary classification network.

If this is right

  • Baseline detectors achieve higher accuracy and average precision on multiple public surface defect datasets.
  • The trained models produce saliency maps that are more concentrated and easier for humans to interpret.
  • Interpretability is incorporated directly into training with zero extra cost at inference time.
  • The consistency constraint acts as an effective regularizer that improves robustness in data-limited regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same consistency mechanism could be tested on other vision tasks that suffer from limited labels and require explainability, such as anomaly detection in medical images.
  • Replacing saliency maps with other forms of explanation as the prior might strengthen or weaken the observed gains and should be measured directly.
  • Combining the knowledge-guided loss with existing data-augmentation or semi-supervised techniques could produce further reductions in required training data.

Load-bearing premise

Saliency maps from the primary network constitute reliable prior knowledge that can be used to enforce consistency without introducing new biases or overfitting.

What would settle it

If retraining baseline detectors on the same public defect datasets with the added consistency loss produces no measurable gain in accuracy or AP, or if the resulting saliency maps are not visibly more concentrated than the baselines, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.17780 by Bingguo Liu, Dong Ye, Guodong Liu, Hang-Cheng Dong.

Figure 1
Figure 1. Figure 1: Linear approximation, polynomial approximation, and piecewise linear approximation [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
read the original abstract

Deep learning-based methods have become the de facto standard for industrial defect detection. However, their data-hungry nature and inherent "black-box" characteristics often lead to performance bottlenecks and limited trustworthiness in real-world applications. To address these challenges, this paper proposes a novel knowledge-guided loss function that seamlessly integrates model interpretability into the training process without incurring any additional inference cost. Our method operates in two phases: first, a primary classification network is trained, and its explanations, in the form of saliency maps, are generated as prior knowledge. Second, a multi-task learning framework is established, where the main task performs classification, and an auxiliary task imposes consistency between the saliency maps of the final model and the primary model. This consistency is enforced by a dedicated knowledge-guided loss term, effectively acting as a powerful regularizer to steer the model towards robust feature representations. Extensive experiments on multiple public defect datasets demonstrate that our approach consistently enhances the performance of baseline models in terms of accuracy and AP. Moreover, visual analysis reveals that the proposed method yields more concentrated and human-intelligible saliency maps. This work presents a simple yet effective paradigm for bridging the gap between model performance and interpretability, paving the way for more reliable and high-performing vision systems in industrial quality inspection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-phase approach for data-efficient surface defect detection: a primary classification network is first trained on limited defect data to produce saliency maps as prior knowledge; a multi-task framework is then used in which the main task performs classification while an auxiliary task applies a knowledge-guided loss to enforce consistency between the final model's saliency maps and those of the primary model, acting as a regularizer toward robust features. The abstract claims that this yields consistent accuracy and AP improvements over baselines on multiple public defect datasets together with more concentrated, human-intelligible saliency maps, all without extra inference cost.

Significance. If the empirical claims hold after proper validation, the work would offer a practical way to inject interpretability constraints into training for industrial vision tasks that suffer from data scarcity, potentially improving both detection performance and model trustworthiness in quality-inspection pipelines.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent gains in accuracy and AP' and 'more concentrated and human-intelligible saliency maps' is asserted without any numerical results, baseline names, dataset identifiers, or statistical significance tests appearing in the provided summary; this absence prevents assessment of whether the reported improvements are load-bearing or merely incremental.
  2. [§3] §3 (Method, knowledge-guided loss definition): the auxiliary loss directly penalizes deviation from saliency maps generated by the primary network; because those maps are themselves produced by a model trained on the same limited data, the approach risks propagating noisy or background-biased explanations rather than robust defect cues. No ablation that replaces the primary maps with randomized or degraded versions is described, leaving open whether any auxiliary consistency term would produce similar gains.
minor comments (2)
  1. [§3.2] Clarify the precise mathematical form of the knowledge-guided loss (e.g., L1, KL divergence, or cosine similarity) and its weighting hyper-parameter relative to the classification loss.
  2. [§4.4] Add quantitative saliency evaluation metrics (e.g., pointing-game accuracy or IoU against defect annotations) alongside the qualitative visual comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below, providing clarifications based on the full experimental section and offering targeted revisions where appropriate to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent gains in accuracy and AP' and 'more concentrated and human-intelligible saliency maps' is asserted without any numerical results, baseline names, dataset identifiers, or statistical significance tests appearing in the provided summary; this absence prevents assessment of whether the reported improvements are load-bearing or merely incremental.

    Authors: The full manuscript in Section 4 reports quantitative results across multiple public datasets (NEU-DET, DAGM, and MVTec AD), comparing against several baselines including standard CNN classifiers and prior defect detection methods. Tables present mean accuracy and AP values with standard deviations over multiple runs, along with paired t-test p-values confirming statistical significance of the gains. The abstract summarizes these outcomes at a high level for brevity, as is standard practice; we will revise the abstract to explicitly reference the detailed tables and datasets in Section 4 for improved clarity. revision: partial

  2. Referee: [§3] §3 (Method, knowledge-guided loss definition): the auxiliary loss directly penalizes deviation from saliency maps generated by the primary network; because those maps are themselves produced by a model trained on the same limited data, the approach risks propagating noisy or background-biased explanations rather than robust defect cues. No ablation that replaces the primary maps with randomized or degraded versions is described, leaving open whether any auxiliary consistency term would produce similar gains.

    Authors: The primary network is trained end-to-end on the limited defect data specifically for classification, and its saliency maps are shown in Section 4 to highlight defect regions more effectively than the final model without the proposed loss. The knowledge-guided consistency term is intended to regularize toward these task-relevant cues rather than arbitrary noise. We agree that an ablation replacing the primary saliency maps with randomized or degraded versions would strengthen the claim that the specific prior is beneficial; we will add this control experiment to the revised Section 4, demonstrating that random maps yield no improvement or even degradation relative to the proposed method. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the two-phase knowledge-guided training method.

full rationale

The paper proposes a two-stage procedure in which a primary classification network is trained independently on the defect data to produce saliency maps, which are then used as fixed prior targets in a knowledge-guided consistency loss for a subsequent multi-task model. No equations or derivations are presented that reduce the final performance claims or saliency improvements to the inputs by construction; the primary maps are generated from a separate training run before being applied as guidance. Experiments on public datasets supply external validation, and no load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to force the result. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that primary-model saliency maps are trustworthy priors and on the introduction of a new loss term whose effectiveness is asserted but not derived from first principles or external benchmarks in the abstract.

axioms (1)
  • domain assumption Saliency maps from a primary classification network provide meaningful and transferable prior knowledge for guiding feature learning in surface defect detection.
    Invoked to justify the auxiliary consistency task and the knowledge-guided loss.
invented entities (1)
  • knowledge-guided loss function no independent evidence
    purpose: Enforce consistency between saliency maps of primary and final models to act as a regularizer.
    New component introduced to integrate interpretability into training; no independent evidence or external validation supplied in the abstract.

pith-pipeline@v0.9.0 · 5762 in / 1370 out tokens · 43115 ms · 2026-05-20T12:40:18.217554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024

    Haipeng Fan and Zhongjun Qiu. A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024

  2. [2]

    Comparative study on deep-learning-based leather surface defect identification.Measurement Science and Technology, 35(1):015402, 2024

    Zhiqiang Chen, Daxing Xu, Jiehang Deng, Yi Chen, and Chuan Li. Comparative study on deep-learning-based leather surface defect identification.Measurement Science and Technology, 35(1):015402, 2024

  3. [3]

    Whole surface defect detection method for bearing rings based on machine vision.Measurement Science and Technology, 34(1):015017, 2023

    Zhou Ping, Zhang Chuangchuang, Zhou Gongbo, He Zhenzhi, Yan Xiaodong, Wang Shihao, Sun Meng, and Hu Bing. Whole surface defect detection method for bearing rings based on machine vision.Measurement Science and Technology, 34(1):015017, 2023

  4. [4]

    Deep learning.nature, 521(7553):436–444, 2015

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 11 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al Table 6: Ablation on dataset KolektorSDD2 (AP in %). MethodsN a = 0N a = 16N a = 53N a = 126N a = 246(N all ) FullGrad 68.91 88.33 90.56 93.57 93.96 LayerCAM 68.91 90.88 90.47 92.53 93.94

  5. [5]

    Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection.CIRP annals, 65(1):417–420, 2016

    Daniel Weimer, Bernd Scholz-Reiter, and Moshe Shpitalni. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection.CIRP annals, 65(1):417–420, 2016

  6. [7]

    A multitask learning-based neural network for defect detection on textured surfaces under weak supervision.IEEE transactions on instrumen- tation and measurement, 70:1–14, 2021

    Kuikui He, Xiaotao Liu, Jing Liu, and Peng Wu. A multitask learning-based neural network for defect detection on textured surfaces under weak supervision.IEEE transactions on instrumen- tation and measurement, 70:1–14, 2021

  7. [9]

    A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

    Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

  8. [10]

    Peeking inside the black-box: a survey on explainable artificial intelligence (xai).IEEE access, 6:52138–52160, 2018

    Amina Adadi and Mohammed Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (xai).IEEE access, 6:52138–52160, 2018

  9. [11]

    Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

    Alejandro Barredo Arrieta, Natalia D´ ıaz-Rodr´ ıguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garc´ ıa, Sergio Gil-L´ opez, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020

  10. [14]

    Structured knowledge distillation for semantic segmentation

    Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. Structured knowledge distillation for semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2604–2613, 2019

  11. [15]

    Deep learning model for imbalanced multi-label surface defect classification.Measurement Science and Technology, 33(3):035601, 2022

    Yang Liu, Yachao Yuan, and Jing Liu. Deep learning model for imbalanced multi-label surface defect classification.Measurement Science and Technology, 33(3):035601, 2022

  12. [16]

    Steel surface defect classification using deep residual neural network.Metals, 10(6):846, 2020

    Ihor Konovalenko, Pavlo Maruschak, Janette Brezinov´ a, J´ an Viˇ n´ aˇ s, and Jakub Brezina. Steel surface defect classification using deep residual neural network.Metals, 10(6):846, 2020

  13. [17]

    Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network.Optics and Lasers in Engineering, 122:294–302, 2019

    Yu He, Kechen Song, Hongwen Dong, and Yunhui Yan. Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network.Optics and Lasers in Engineering, 122:294–302, 2019

  14. [18]

    Esddnet: efficient small defect detection network of workpiece surface.Measurement Science and Technology, 33(10):105007, 2022

    Guodong Chen, Feng Xu, Guihua Liu, ChunMei Chen, Manlu Liu, Jing Zhang, and Xiaoming Niu. Esddnet: efficient small defect detection network of workpiece surface.Measurement Science and Technology, 33(10):105007, 2022

  15. [19]

    A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024

    Haipeng Fan and Zhongjun Qiu. A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024. 12 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al

  16. [20]

    Cadn: A weakly supervised learning-based category-aware object detection network for surface defect detection

    Jiabin Zhang, Hu Su, Wei Zou, Xinyi Gong, Zhengtao Zhang, and Fei Shen. Cadn: A weakly supervised learning-based category-aware object detection network for surface defect detection. Pattern Recognition, 109:107571, 2021

  17. [22]

    A multi-branch u-net for steel surface defect type and severity segmentation.Metals, 11(6):870, 2021

    Robby Neven and Toon Goedeme. A multi-branch u-net for steel surface defect type and severity segmentation.Metals, 11(6):870, 2021

  18. [23]

    Learning deep features for discriminative localization

    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016

  19. [24]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  20. [25]

    Score-cam: Score-weighted visual explanations for convolutional neural networks

    Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. Score-cam: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25, 2020

  21. [27]

    Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization

    Harish Guruprasad Ramaswamy et al. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. Inproceedings of the IEEE/CVF winter conference on applications of computer vision, pages 983–991, 2020

  22. [28]

    A generic automated surface defect detection based on a bilinear model.Applied Sciences, 9(15):3159, 2019

    Fei Zhou, Guihua Liu, Feng Xu, and Hao Deng. A generic automated surface defect detection based on a bilinear model.Applied Sciences, 9(15):3159, 2019

  23. [29]

    Region-aware cam: high-resolution weakly-supervised defect segmentation via salient region perception.arXiv preprint arXiv:2506.22866, 2025

    Hang-Cheng Dong, Lu Zou, Bingguo Liu, Dong Ye, and Guodong Liu. Region-aware cam: high-resolution weakly-supervised defect segmentation via salient region perception.arXiv preprint arXiv:2506.22866, 2025

  24. [30]

    Truncated and integrated class activation maps for weakly supervised defect detection.Measurement Science and Technology, 36(6):066138, 2025

    Hang-Cheng Dong, Bingguo Liu, Dong Ye, and Guodong Liu. Truncated and integrated class activation maps for weakly supervised defect detection.Measurement Science and Technology, 36(6):066138, 2025

  25. [31]

    Multitask learning.Machine learning, 28(1):41–75, 1997

    Rich Caruana. Multitask learning.Machine learning, 28(1):41–75, 1997

  26. [32]

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

    Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018

  27. [33]

    Knowledge distillation: A survey.International journal of computer vision, 129(6):1789–1819, 2021

    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International journal of computer vision, 129(6):1789–1819, 2021

  28. [34]

    Attention and feature transfer based knowledge distillation.Scientific Reports, 13(1):18369, 2023

    Guoliang Yang, Shuaiying Yu, Yangyang Sheng, and Hao Yang. Attention and feature transfer based knowledge distillation.Scientific Reports, 13(1):18369, 2023

  29. [35]

    Training with noise is equivalent to tikhonov regularization.Neural computation, 7(1):108–116, 1995

    Chris M Bishop. Training with noise is equivalent to tikhonov regularization.Neural computation, 7(1):108–116, 1995

  30. [36]

    Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014

  31. [37]

    Full-gradient representation for neural network visualization

    Suraj Srinivas and Fran¸ cois Fleuret. Full-gradient representation for neural network visualization. Advances in neural information processing systems, 32, 2019. 13 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al

  32. [38]

    Layercam: Exploring hierarchical class activation maps for localization.IEEE transactions on image processing, 30:5875–5888, 2021

    Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, and Yunchao Wei. Layercam: Exploring hierarchical class activation maps for localization.IEEE transactions on image processing, 30:5875–5888, 2021

  33. [41]

    f-anogan: Fast unsupervised anomaly detection with generative adversarial networks

    Thomas Schlegl, Philipp Seeb¨ ock, Sebastian M Waldstein, Georg Langs, and Ursula Schmidt- Erfurth. f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis, 54:30–44, 2019

  34. [42]

    Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings

    Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4183–4192, 2020

  35. [43]

    Segmentation-based deep-learning approach for surface-defect detection.Journal of Intelligent Manufacturing, 31(3):759–776, 2020

    Domen Tabernik, Samo ˇSela, Jure Skvarˇ c, and Danijel Skoˇ caj. Segmentation-based deep-learning approach for surface-defect detection.Journal of Intelligent Manufacturing, 31(3):759–776, 2020

  36. [44]

    Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129:103459, 2021

    Jakob Boˇ ziˇ c, Domen Tabernik, and Danijel Skoˇ caj. Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129:103459, 2021

  37. [45]

    Maminet: Memory- attended multi-inference network for surface-defect detection.Computers in Industry, 145:103834, 2023

    Xiaoyan Luo, Sen Li, Yu Wang, Tiancheng Zhan, Xiaofeng Shi, and Bo Liu. Maminet: Memory- attended multi-inference network for surface-defect detection.Computers in Industry, 145:103834, 2023

  38. [46]

    A method for surface defect detection based on multiscale feature fusion and pyramid attention.IEEE Access, 12:36457–36465, 2024

    Ying Tang, Hongyuan Wang, Qunying Zhou, and Boyan Sun. A method for surface defect detection based on multiscale feature fusion and pyramid attention.IEEE Access, 12:36457–36465, 2024. 14