Network Knowledge Prior Guided Learning for Data-Efficient Surface Defect Detection
Pith reviewed 2026-05-20 12:40 UTC · model grok-4.3
The pith
A knowledge-guided loss enforces consistency with prior saliency maps to improve defect detection accuracy and interpretability without added inference cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that training a model under an auxiliary consistency constraint between its own saliency maps and those produced by a separately trained primary network steers the model toward more robust feature representations, yielding higher accuracy and average precision on public defect datasets together with more concentrated and human-intelligible saliency maps, all without any increase in inference cost.
What carries the argument
The knowledge-guided loss term inside a multi-task learning setup that penalizes divergence between saliency maps of a final model and those generated by a primary classification network.
If this is right
- Baseline detectors achieve higher accuracy and average precision on multiple public surface defect datasets.
- The trained models produce saliency maps that are more concentrated and easier for humans to interpret.
- Interpretability is incorporated directly into training with zero extra cost at inference time.
- The consistency constraint acts as an effective regularizer that improves robustness in data-limited regimes.
Where Pith is reading between the lines
- The same consistency mechanism could be tested on other vision tasks that suffer from limited labels and require explainability, such as anomaly detection in medical images.
- Replacing saliency maps with other forms of explanation as the prior might strengthen or weaken the observed gains and should be measured directly.
- Combining the knowledge-guided loss with existing data-augmentation or semi-supervised techniques could produce further reductions in required training data.
Load-bearing premise
Saliency maps from the primary network constitute reliable prior knowledge that can be used to enforce consistency without introducing new biases or overfitting.
What would settle it
If retraining baseline detectors on the same public defect datasets with the added consistency loss produces no measurable gain in accuracy or AP, or if the resulting saliency maps are not visibly more concentrated than the baselines, the central claim would be falsified.
Figures
read the original abstract
Deep learning-based methods have become the de facto standard for industrial defect detection. However, their data-hungry nature and inherent "black-box" characteristics often lead to performance bottlenecks and limited trustworthiness in real-world applications. To address these challenges, this paper proposes a novel knowledge-guided loss function that seamlessly integrates model interpretability into the training process without incurring any additional inference cost. Our method operates in two phases: first, a primary classification network is trained, and its explanations, in the form of saliency maps, are generated as prior knowledge. Second, a multi-task learning framework is established, where the main task performs classification, and an auxiliary task imposes consistency between the saliency maps of the final model and the primary model. This consistency is enforced by a dedicated knowledge-guided loss term, effectively acting as a powerful regularizer to steer the model towards robust feature representations. Extensive experiments on multiple public defect datasets demonstrate that our approach consistently enhances the performance of baseline models in terms of accuracy and AP. Moreover, visual analysis reveals that the proposed method yields more concentrated and human-intelligible saliency maps. This work presents a simple yet effective paradigm for bridging the gap between model performance and interpretability, paving the way for more reliable and high-performing vision systems in industrial quality inspection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-phase approach for data-efficient surface defect detection: a primary classification network is first trained on limited defect data to produce saliency maps as prior knowledge; a multi-task framework is then used in which the main task performs classification while an auxiliary task applies a knowledge-guided loss to enforce consistency between the final model's saliency maps and those of the primary model, acting as a regularizer toward robust features. The abstract claims that this yields consistent accuracy and AP improvements over baselines on multiple public defect datasets together with more concentrated, human-intelligible saliency maps, all without extra inference cost.
Significance. If the empirical claims hold after proper validation, the work would offer a practical way to inject interpretability constraints into training for industrial vision tasks that suffer from data scarcity, potentially improving both detection performance and model trustworthiness in quality-inspection pipelines.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent gains in accuracy and AP' and 'more concentrated and human-intelligible saliency maps' is asserted without any numerical results, baseline names, dataset identifiers, or statistical significance tests appearing in the provided summary; this absence prevents assessment of whether the reported improvements are load-bearing or merely incremental.
- [§3] §3 (Method, knowledge-guided loss definition): the auxiliary loss directly penalizes deviation from saliency maps generated by the primary network; because those maps are themselves produced by a model trained on the same limited data, the approach risks propagating noisy or background-biased explanations rather than robust defect cues. No ablation that replaces the primary maps with randomized or degraded versions is described, leaving open whether any auxiliary consistency term would produce similar gains.
minor comments (2)
- [§3.2] Clarify the precise mathematical form of the knowledge-guided loss (e.g., L1, KL divergence, or cosine similarity) and its weighting hyper-parameter relative to the classification loss.
- [§4.4] Add quantitative saliency evaluation metrics (e.g., pointing-game accuracy or IoU against defect annotations) alongside the qualitative visual comparisons.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below, providing clarifications based on the full experimental section and offering targeted revisions where appropriate to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent gains in accuracy and AP' and 'more concentrated and human-intelligible saliency maps' is asserted without any numerical results, baseline names, dataset identifiers, or statistical significance tests appearing in the provided summary; this absence prevents assessment of whether the reported improvements are load-bearing or merely incremental.
Authors: The full manuscript in Section 4 reports quantitative results across multiple public datasets (NEU-DET, DAGM, and MVTec AD), comparing against several baselines including standard CNN classifiers and prior defect detection methods. Tables present mean accuracy and AP values with standard deviations over multiple runs, along with paired t-test p-values confirming statistical significance of the gains. The abstract summarizes these outcomes at a high level for brevity, as is standard practice; we will revise the abstract to explicitly reference the detailed tables and datasets in Section 4 for improved clarity. revision: partial
-
Referee: [§3] §3 (Method, knowledge-guided loss definition): the auxiliary loss directly penalizes deviation from saliency maps generated by the primary network; because those maps are themselves produced by a model trained on the same limited data, the approach risks propagating noisy or background-biased explanations rather than robust defect cues. No ablation that replaces the primary maps with randomized or degraded versions is described, leaving open whether any auxiliary consistency term would produce similar gains.
Authors: The primary network is trained end-to-end on the limited defect data specifically for classification, and its saliency maps are shown in Section 4 to highlight defect regions more effectively than the final model without the proposed loss. The knowledge-guided consistency term is intended to regularize toward these task-relevant cues rather than arbitrary noise. We agree that an ablation replacing the primary saliency maps with randomized or degraded versions would strengthen the claim that the specific prior is beneficial; we will add this control experiment to the revised Section 4, demonstrating that random maps yield no improvement or even degradation relative to the proposed method. revision: yes
Circularity Check
No significant circularity in the two-phase knowledge-guided training method.
full rationale
The paper proposes a two-stage procedure in which a primary classification network is trained independently on the defect data to produce saliency maps, which are then used as fixed prior targets in a knowledge-guided consistency loss for a subsequent multi-task model. No equations or derivations are presented that reduce the final performance claims or saliency improvements to the inputs by construction; the primary maps are generated from a separate training run before being applied as guidance. Experiments on public datasets supply external validation, and no load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to force the result. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Saliency maps from a primary classification network provide meaningful and transferable prior knowledge for guiding feature learning in surface defect detection.
invented entities (1)
-
knowledge-guided loss function
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a multi-task learning framework is established, where the main task performs classification, and an auxiliary task imposes consistency between the saliency maps of the final model and the primary model. This consistency is enforced by a dedicated knowledge-guided loss term
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the proposed method yields more concentrated and human-intelligible saliency maps
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Haipeng Fan and Zhongjun Qiu. A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024
work page 2024
-
[2]
Zhiqiang Chen, Daxing Xu, Jiehang Deng, Yi Chen, and Chuan Li. Comparative study on deep-learning-based leather surface defect identification.Measurement Science and Technology, 35(1):015402, 2024
work page 2024
-
[3]
Zhou Ping, Zhang Chuangchuang, Zhou Gongbo, He Zhenzhi, Yan Xiaodong, Wang Shihao, Sun Meng, and Hu Bing. Whole surface defect detection method for bearing rings based on machine vision.Measurement Science and Technology, 34(1):015017, 2023
work page 2023
-
[4]
Deep learning.nature, 521(7553):436–444, 2015
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 11 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al Table 6: Ablation on dataset KolektorSDD2 (AP in %). MethodsN a = 0N a = 16N a = 53N a = 126N a = 246(N all ) FullGrad 68.91 88.33 90.56 93.57 93.96 LayerCAM 68.91 90.88 90.47 92.53 93.94
work page 2015
-
[5]
Daniel Weimer, Bernd Scholz-Reiter, and Moshe Shpitalni. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection.CIRP annals, 65(1):417–420, 2016
work page 2016
-
[7]
Kuikui He, Xiaotao Liu, Jing Liu, and Peng Wu. A multitask learning-based neural network for defect detection on textured surfaces under weak supervision.IEEE transactions on instrumen- tation and measurement, 70:1–14, 2021
work page 2021
-
[9]
A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019
Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019
work page 2019
-
[10]
Amina Adadi and Mohammed Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (xai).IEEE access, 6:52138–52160, 2018
work page 2018
-
[11]
Alejandro Barredo Arrieta, Natalia D´ ıaz-Rodr´ ıguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garc´ ıa, Sergio Gil-L´ opez, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.Information fusion, 58:82–115, 2020
work page 2020
-
[14]
Structured knowledge distillation for semantic segmentation
Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. Structured knowledge distillation for semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2604–2613, 2019
work page 2019
-
[15]
Yang Liu, Yachao Yuan, and Jing Liu. Deep learning model for imbalanced multi-label surface defect classification.Measurement Science and Technology, 33(3):035601, 2022
work page 2022
-
[16]
Steel surface defect classification using deep residual neural network.Metals, 10(6):846, 2020
Ihor Konovalenko, Pavlo Maruschak, Janette Brezinov´ a, J´ an Viˇ n´ aˇ s, and Jakub Brezina. Steel surface defect classification using deep residual neural network.Metals, 10(6):846, 2020
work page 2020
-
[17]
Yu He, Kechen Song, Hongwen Dong, and Yunhui Yan. Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network.Optics and Lasers in Engineering, 122:294–302, 2019
work page 2019
-
[18]
Guodong Chen, Feng Xu, Guihua Liu, ChunMei Chen, Manlu Liu, Jing Zhang, and Xiaoming Niu. Esddnet: efficient small defect detection network of workpiece surface.Measurement Science and Technology, 33(10):105007, 2022
work page 2022
-
[19]
Haipeng Fan and Zhongjun Qiu. A novel deep learning algorithm applied to machine vision inspection for surface defects of injection moulded products.Measurement Science and Technology, 35(4):046003, 2024. 12 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al
work page 2024
-
[20]
Jiabin Zhang, Hu Su, Wei Zou, Xinyi Gong, Zhengtao Zhang, and Fei Shen. Cadn: A weakly supervised learning-based category-aware object detection network for surface defect detection. Pattern Recognition, 109:107571, 2021
work page 2021
-
[22]
A multi-branch u-net for steel surface defect type and severity segmentation.Metals, 11(6):870, 2021
Robby Neven and Toon Goedeme. A multi-branch u-net for steel surface defect type and severity segmentation.Metals, 11(6):870, 2021
work page 2021
-
[23]
Learning deep features for discriminative localization
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016
work page 2016
-
[24]
Grad-cam: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017
work page 2017
-
[25]
Score-cam: Score-weighted visual explanations for convolutional neural networks
Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. Score-cam: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25, 2020
work page 2020
-
[27]
Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization
Harish Guruprasad Ramaswamy et al. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. Inproceedings of the IEEE/CVF winter conference on applications of computer vision, pages 983–991, 2020
work page 2020
-
[28]
Fei Zhou, Guihua Liu, Feng Xu, and Hao Deng. A generic automated surface defect detection based on a bilinear model.Applied Sciences, 9(15):3159, 2019
work page 2019
-
[29]
Hang-Cheng Dong, Lu Zou, Bingguo Liu, Dong Ye, and Guodong Liu. Region-aware cam: high-resolution weakly-supervised defect segmentation via salient region perception.arXiv preprint arXiv:2506.22866, 2025
-
[30]
Hang-Cheng Dong, Bingguo Liu, Dong Ye, and Guodong Liu. Truncated and integrated class activation maps for weakly supervised defect detection.Measurement Science and Technology, 36(6):066138, 2025
work page 2025
-
[31]
Multitask learning.Machine learning, 28(1):41–75, 1997
Rich Caruana. Multitask learning.Machine learning, 28(1):41–75, 1997
work page 1997
-
[32]
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018
work page 2018
-
[33]
Knowledge distillation: A survey.International journal of computer vision, 129(6):1789–1819, 2021
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International journal of computer vision, 129(6):1789–1819, 2021
work page 2021
-
[34]
Attention and feature transfer based knowledge distillation.Scientific Reports, 13(1):18369, 2023
Guoliang Yang, Shuaiying Yu, Yangyang Sheng, and Hao Yang. Attention and feature transfer based knowledge distillation.Scientific Reports, 13(1):18369, 2023
work page 2023
-
[35]
Training with noise is equivalent to tikhonov regularization.Neural computation, 7(1):108–116, 1995
Chris M Bishop. Training with noise is equivalent to tikhonov regularization.Neural computation, 7(1):108–116, 1995
work page 1995
-
[36]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014
work page 1929
-
[37]
Full-gradient representation for neural network visualization
Suraj Srinivas and Fran¸ cois Fleuret. Full-gradient representation for neural network visualization. Advances in neural information processing systems, 32, 2019. 13 IOP PublishingJournalvv(yyyy) aaaaaa Authoret al
work page 2019
-
[38]
Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, and Yunchao Wei. Layercam: Exploring hierarchical class activation maps for localization.IEEE transactions on image processing, 30:5875–5888, 2021
work page 2021
-
[41]
f-anogan: Fast unsupervised anomaly detection with generative adversarial networks
Thomas Schlegl, Philipp Seeb¨ ock, Sebastian M Waldstein, Georg Langs, and Ursula Schmidt- Erfurth. f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis, 54:30–44, 2019
work page 2019
-
[42]
Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4183–4192, 2020
work page 2020
-
[43]
Domen Tabernik, Samo ˇSela, Jure Skvarˇ c, and Danijel Skoˇ caj. Segmentation-based deep-learning approach for surface-defect detection.Journal of Intelligent Manufacturing, 31(3):759–776, 2020
work page 2020
-
[44]
Jakob Boˇ ziˇ c, Domen Tabernik, and Danijel Skoˇ caj. Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129:103459, 2021
work page 2021
-
[45]
Xiaoyan Luo, Sen Li, Yu Wang, Tiancheng Zhan, Xiaofeng Shi, and Bo Liu. Maminet: Memory- attended multi-inference network for surface-defect detection.Computers in Industry, 145:103834, 2023
work page 2023
-
[46]
Ying Tang, Hongyuan Wang, Qunying Zhou, and Boyan Sun. A method for surface defect detection based on multiscale feature fusion and pyramid attention.IEEE Access, 12:36457–36465, 2024. 14
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.