Deep Fault Diagnosis for Rotating Machinery with Scarce Labeled Samples
Pith reviewed 2026-05-24 22:07 UTC · model grok-4.3
The pith
SVM models can label extra samples from spectrogram features to let a 2D CNN diagnose rotating machinery faults more accurately when labeled data is scarce.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DFD works in three phases: spectrograms of raw signals yield a pool of time-frequency features; candidate SVMs trained on feature combinations with scarce labels are ranked on a validation set and the top models predict labels for unlabeled data; the resulting augmented training set trains a 2D CNN that outperforms both the original SVMs and a vanilla CNN trained only on the scarce labels.
What carries the argument
The augmented training set (ATS) formed by combining scarce labeled samples with pseudo-label predictions from the highest-performing SVM models.
If this is right
- The CNN learns more discriminative features than either the SVMs or a CNN trained solely on scarce labels.
- Diagnostic performance exceeds that of the selected SVM models alone.
- The overall pipeline remains computationally efficient enough for real-time monitoring.
- The method transfers diagnostic expertise encoded in hand-crafted features into the deep network without requiring additional manual labeling.
Where Pith is reading between the lines
- The same pseudo-labeling step could be applied to other vibration or sensor-based classification problems where domain-specific features are already well studied.
- If SVM predictions vary across runs, adding a confidence threshold or ensemble voting on the pseudo-labels might reduce noise in the augmented set.
- Extending the feature pool with additional signal-processing techniques could increase the chance of selecting even stronger SVM teachers.
Load-bearing premise
The predictions from the selected SVM models on unlabeled samples are accurate enough to serve as reliable pseudo-labels that improve CNN training rather than introduce harmful errors.
What would settle it
If a CNN trained on the augmented training set shows lower test accuracy than a vanilla CNN trained only on the original scarce labeled samples, the central claim does not hold.
Figures
read the original abstract
Early and accurately detecting faults in rotating machinery is crucial for operation safety of the modern manufacturing system. In this paper, we proposed a novel Deep fault diagnosis (DFD) method for rotating machinery with scarce labeled samples. DFD tackles the challenging problem by transferring knowledge from shallow models, which is based on the idea that shallow models trained with different hand-crafted features can reveal the latent prior knowledge and diagnostic expertise and have good generalization ability even with scarce labeled samples. DFD can be divided into three phases. First, a spectrogram of the raw vibration signal is calculated by applying a Short-time Fourier transform (STFT). From those spectrograms, discriminative time-frequency domain features can be extracted and used to form a feature pool. Then, several candidate Support vector machine (SVM) models are trained with different combinations of features in the feature pool with scarce labeled samples. By evaluating the pretrained SVM models on the validation set, the most discriminative features and best-performed SVM models can be selected, which are used to make predictions on the unlabeled samples. The predicted labels reserve the expert knowledge originally carried by the SVM model. They are combined together with the scarce fine labeled samples to form an Augmented training set (ATS). Finally, a novel 2D deep Convolutional neural network (CNN) model is trained on the ATS to learn more discriminative features and a better classifier. Experimental results on two fault diagnosis datasets demonstrate the effectiveness of the proposed DFD, which achieves better performance than SVM models and the vanilla deep CNN model trained on scarce labeled samples. Moreover, it is computationally efficient and is promising for real-time rotating machinery fault diagnosis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Deep Fault Diagnosis (DFD) method for rotating machinery under scarce labeled samples. It computes STFT spectrograms, extracts a pool of time-frequency features, trains multiple SVMs on different feature combinations using the limited labels, selects the best SVMs via validation performance, generates pseudo-labels on unlabeled samples, augments the training set, and trains a 2D CNN on the augmented set. The central claim is that this yields better fault diagnosis performance than standalone SVMs or a vanilla CNN trained only on the scarce labels, demonstrated on two datasets.
Significance. If the empirical gains are robust, the approach offers a practical semi-supervised strategy for industrial fault diagnosis by injecting diagnostic expertise from shallow models into deep learning when labels are expensive to obtain. It targets a real constraint in rotating machinery monitoring and could inform other domains with limited supervision, provided the pseudo-label mechanism is shown to be reliable rather than a source of noise.
major comments (2)
- [method description (DFD phases)] The method description (three phases of DFD): the performance gain over the vanilla CNN is attributed to the pseudo-labels generated by the selected SVMs, yet no measurement of pseudo-label accuracy on the unlabeled samples is reported, nor is there an ablation that trains the CNN with and without the augmented set. With very few labeled samples the validation set used for SVM selection is necessarily small, so the assumption that the selected SVMs produce sufficiently accurate pseudo-labels on the target distribution is unverified and load-bearing for the central claim.
- [Experimental results] Experimental results section: the abstract and results assert superior performance on two fault diagnosis datasets, but the manuscript supplies no concrete metrics (accuracy, F1, etc.), sample sizes, number of labeled vs. unlabeled samples, cross-validation protocol, or statistical significance tests. Without these details the central empirical claim cannot be evaluated.
minor comments (2)
- [abstract] The abstract states 'a novel 2D deep Convolutional neural network (CNN) model' with inconsistent capitalization of 'Convolutional'.
- [method description] Notation for the Augmented training set (ATS) is introduced but not used consistently in later sections when describing the CNN training.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly.
read point-by-point responses
-
Referee: [method description (DFD phases)] The method description (three phases of DFD): the performance gain over the vanilla CNN is attributed to the pseudo-labels generated by the selected SVMs, yet no measurement of pseudo-label accuracy on the unlabeled samples is reported, nor is there an ablation that trains the CNN with and without the augmented set. With very few labeled samples the validation set used for SVM selection is necessarily small, so the assumption that the selected SVMs produce sufficiently accurate pseudo-labels on the target distribution is unverified and load-bearing for the central claim.
Authors: We agree an ablation study would strengthen the evidence. We will add a comparison of CNN performance trained on scarce labels alone versus the augmented set in the revision. Direct pseudo-label accuracy cannot be computed without ground-truth labels on the unlabeled data; downstream CNN gains provide the supporting evidence. SVM selection employs k-fold cross-validation on the labeled samples to address the small validation set concern. revision: partial
-
Referee: [Experimental results] Experimental results section: the abstract and results assert superior performance on two fault diagnosis datasets, but the manuscript supplies no concrete metrics (accuracy, F1, etc.), sample sizes, number of labeled vs. unlabeled samples, cross-validation protocol, or statistical significance tests. Without these details the central empirical claim cannot be evaluated.
Authors: We will revise the experimental section to explicitly report all metrics (accuracy, F1), sample sizes (labeled vs. unlabeled per dataset), the cross-validation protocol, and any statistical significance tests. These details exist in our experiments and will be clearly presented. revision: yes
- Direct measurement of pseudo-label accuracy on the unlabeled samples, as ground-truth labels are unavailable by definition.
Circularity Check
No significant circularity; empirical pipeline validated externally
full rationale
The paper describes a three-phase empirical method (STFT spectrograms, multi-SVM feature selection for pseudo-labeling unlabeled samples, then CNN training on the augmented set) whose performance claims rest on experiments on two external fault diagnosis datasets. No equations, derivations, or self-citations reduce the claimed gains to quantities defined by the method itself. The pseudo-label accuracy assumption is a validity risk but does not create circularity in the derivation chain. This is the most common honest finding for standard ML pipelines.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Shallow models trained with different hand-crafted features can reveal the latent prior knowledge and diagnostic expertise and have good generalization ability even with scarce labeled samples.
Reference graph
Works this paper leans on
-
[1]
A sparse auto encoder- based deep neural network approach for induction motor faults classification
W.J. Sun, S.Y. Shao, R. Zhao, et al. , “A sparse auto encoder- based deep neural network approach for induction motor faults classification”, Measurement, Vol.89, pp.171–178, 2016
work page 2016
-
[2]
Real-time motor fault detection by 1D convolutional neural networks
T. Ince, S. Kiranyaz, L. Eren, et al. , “Real-time motor fault detection by 1D convolutional neural networks”, IEEE Trans- actions on Industrial Electronics , Vol.63, No.11, pp.7067–7075, 2016
work page 2016
-
[3]
M. Gan, C. Wang and C.A. Zhu, “Construction of hierarchical diagnosis network based on deep learning and its application in the fault pattern recognition of rolling element bearings”, Me- chanical Systems and Signal Processing , Vol.72–73, pp.92–104, 2016
work page 2016
-
[4]
D.Z. Li, W. Wang and F. Ismailm, “An enhanced bispectrum technique with auxiliary frequency injection for induction mo- tor health condition monitoring”, IEEE Transactions on In- strumentation & Measurement , Vol.64, No.10, pp.2679–2687, 2015
work page 2015
-
[5]
Challenges in the indus- trial applications of fault diagnostic systems
S. Dash and V. Venkatasubramanian, “Challenges in the indus- trial applications of fault diagnostic systems”, Computers and Chemical Engineering, Vol.24, No.2–7, pp.785–791, 2000
work page 2000
-
[6]
Support vector machine in machine condition monitoring and fault diagnosis
A. Widodo and B.S. Yang, “Support vector machine in machine condition monitoring and fault diagnosis”, Mechanical Systems and Signal Processing , Vol.21, No.6, pp.2560–2574, 2007
work page 2007
-
[7]
A new approach to intelligent fault diagnosis of rotating machinery
Y.G. Lei, Z.J. He and Y.Y. Zi, “A new approach to intelligent fault diagnosis of rotating machinery”,Expert Systems with Ap- plications, Vol.35, No.4, pp.1593–1600, 2008
work page 2008
-
[8]
J. Zhang, D.Q. Zhang, M.Y. Yang, et al. , “Fault diagnosis for rotating machinery with scarce labeled samples: a Deep CNN method based on knowledge-transferring from shallow models”, International Conference on Control, Automation and Infor- mation Sciences, Hangzhou, China, pp.482–487, 2018
work page 2018
-
[9]
Bearing fault detection via stator current noise cancellation and statistical control
W. Zhou, T.G. Habetler and R.G. Harley, “ Bearing fault detection via stator current noise cancellation and statistical control”,IEEE Transactions on Industrial Electronics , Vol.55, No.12, pp.4260–4269, 2008
work page 2008
-
[10]
Fault diagnosis of rolling element bearing using time-domain features and neural networks
B. Sreejith, A.K. Verma and A. Srividya, “ Fault diagnosis of rolling element bearing using time-domain features and neural networks”,IEEE Region 10 and the Third International Con- ference on Industrial and Information Systems , Peradeniya,Sri Lanka, pp.1–6,2009
work page 2009
-
[11]
Y. Liu, J.H. Zhang, K.J. Qin, et al. , “Diesel engine fault di- agnosis using intrinsic time-scale decomposition and multistage Adaboost relevance vector machine”, Proceedings of the Insti- tution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, Vol.232, No.5, pp.881–894, 2018
work page 2018
-
[12]
Real-time fault diag- nosis for gas turbine generator systems using extreme learning machin
P.K. Wong, Z.X. Yang, C.M Vong, et al., “Real-time fault diag- nosis for gas turbine generator systems using extreme learning machin”, Neurocomputing, Vol.128, pp.249–257, 2014
work page 2014
-
[13]
Gearbox fault iden- tification and classification with convolutional neural net- works
Z.Q. Chen, C. Li and R.V. Sanchez, “Gearbox fault iden- tification and classification with convolutional neural net- works”,Shock and Vibration , Vol.2015, Article ID 390134, 10 pages, 2015
work page 2015
-
[14]
Multivariate empirical mode decomposition and its application to fault diagnosis of rolling bearing
Y. Lv, R. Yuan and G.B. Song, “Multivariate empirical mode decomposition and its application to fault diagnosis of rolling bearing”, Mechanical Systems and Signal Processing , Vol.81, pp.219–234, 2016
work page 2016
-
[15]
J.D. Zheng, H.Y. Pan, X.L Qi, et al. , “Enhanced empirical wavelet transform based time-frequency analysis and its appli- cation to Rolling Bearing Fault Diagnosis”, Acta Electronica Sinica, Vol.46, No.2, pp.358–364, 2018.(in Chinese)
work page 2018
-
[16]
C. Li, V. Sanchez, G. Zurita, et al., “Rolling element bearing de- fect detection using the generalized synchrosqueezing transform guided by timeCfrequency ridge enhancement”, Isa Transac- tions, Vol.60, pp.274–284, 2016
work page 2016
-
[17]
Y. Tian, J. Ma, C. Lu, et al. , “Rolling bearing fault diagnosis under variable conditions using LMD-SVD and extreme learn- ing machine”, Mechanism and Machine Theory , Vol.90, pp.175– 186, 2015
work page 2015
-
[18]
M.A. Hearst, S.T. Dumais, E. Osman, et al. , “Support vector machines”, IEEE Intelligent Systems , Vol.13, No.4, pp.18–28, 1998
work page 1998
-
[19]
X.Y. Zhang, Y.T. Liang, J.Z. Zhou, et al. , “A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM”, Measure- ment, Vol.69, pp.164–179, 2015
work page 2015
-
[20]
A new process in- dustry fault diagnosis algorithm based on ensemble improved binary-tree SVM
A.N. Wang, M. Sha, L.M. Liu, et al. , “A new process in- dustry fault diagnosis algorithm based on ensemble improved binary-tree SVM”, Chinese Journal of Electronics , Vol.24, No.2, pp.258–262, 2015
work page 2015
-
[21]
A decision-theoretic generalization of on-line learning and an application to boosting
Y. Freund and R.E. Robert, “A decision-theoretic generalization of on-line learning and an application to boosting”, Journal of computer and system sciences , Vol.55, No.1, pp.119–139, 1997
work page 1997
-
[22]
Boosting feature selection using information metric for classification
H.W. Liu, L. Liu and H.J. Zhang, “Boosting feature selection using information metric for classification”, Neurocomputing, Vol.73, No.1–3,pp.295–303, 2009
work page 2009
-
[23]
Extreme learning machine: Theory and applications
G.B. Huang, Q.Y. Zhu and C.K. Siew, “Extreme learning machine: Theory and applications”, Neurocomputing, Vol.70, No.1–3, pp.489–501, 2006
work page 2006
-
[24]
A fast learning algo- rithm for deep belief nets
G.E. Hinton, S. Osindero and Y.W. Teh, “A fast learning algo- rithm for deep belief nets”, Neural computation, Vol.18, No.7, pp.1527–1554, 2006
work page 2006
-
[25]
Rolling bear- ing fault diagnosis using an optimization deep belief net- work
H.D. Shao, H.K. Jiang, X. Zhang, et al. , “Rolling bear- ing fault diagnosis using an optimization deep belief net- work”,Measurement Science and Technology , Vol.26, No.11, 2015
work page 2015
-
[26]
Convolutional networks for images, speech, and time series
Y. Lecun and Y. Bengio, “Convolutional networks for images, speech, and time series”, The handbook of brain theory and neu- ral networks,1995
work page 1995
-
[27]
A novel separability ob- jective function in CNN for feature extraction of SAR im- ages
F. Gao, M. Wang, J. Wang, et al., “A novel separability ob- jective function in CNN for feature extraction of SAR im- ages”,Chinese Journal of Electronics , Vol.28, No.2, pp.423– 429,2019
work page 2019
-
[28]
CNN feature boosted seqSLAM for real-Time loop closure detection
D.D. Bai, C.Q. Wang, B. Zhang, et al. , “CNN feature boosted seqSLAM for real-Time loop closure detection”,Chinese Journal of Electronics, Vol.27, No.3, pp.488–499, 2018
work page 2018
-
[29]
J.Y. Gan, Y.K. Zhai, Y. Huang, et al. , “Research of facial beauty prediction based on deep convolutional features using double activation layer”,Acta Electronica Sinica , Vol.47, No.3, pp.636–642, 2019.(in Chinese)
work page 2019
-
[30]
Recent advances in deep learning for speech research at Microsoft
L. Deng, J.Y Li J.T. Huang, et al. , “Recent advances in deep learning for speech research at Microsoft”, IEEE Interna- tional Conference on Acoustics , Vancouver, British Columbia, Canada, pp.8604–8608,2013
work page 2013
-
[31]
ImageNet Clas- sification with Deep Convolutional Neural Networks
A. Krizhevsky, I. Sutskever and G.E. Hinton, “ImageNet Clas- sification with Deep Convolutional Neural Networks”, Interna- tional Conference on Neural Information Processing Systems , Doha, Qatar, pp.1097–1105,2012
work page 2012
-
[32]
Faster R-CNN: towards real-time object detection with region proposal networks
S.Q. Ren, K.M. He, R. Girshick, et al., “Faster R-CNN: towards real-time object detection with region proposal networks”, In- ternational Conference on Neural Information Processing Sys- tems, Kuching, Malaysia, pp.91–99,2015
work page 2015
-
[33]
F. Jia, Y.G. Lei, J. Lin, et al., “Deep neural networks: A promis- ing tool for fault characteristic mining and intelligent diagnosis Deep Fault Diagnosis for Rotating Machinery with Scarce Labeled Samples 11 of rotating machinery with massive data”, Mechanical Systems and Signal Processing , Vol.72–73, pp.303–315, 2016
work page 2016
-
[34]
Fault diagnosis for rotating ma- chinery using multiple sensors and convolutional neural net- works
M. Xia, T. Li, L. Xu, et al. , “Fault diagnosis for rotating ma- chinery using multiple sensors and convolutional neural net- works”, IEEE/ASME Transactions on Mechatronics , Vol.23, No.1, pp.101–110, 2017
work page 2017
-
[35]
Automatic multi- fault recognition in TFDS based on convolutional neural net- work
J.H. Sun, Z.W. Xiao and Y.X. Xie, “Automatic multi- fault recognition in TFDS based on convolutional neural net- work”,Neurocomputing, Vol.222, pp.127–136, 2017
work page 2017
-
[36]
M. Meng, Y.J. Chua, E. Wouterson, et al. , “Ultrasonic signal classification and imaging system for composite materials via deep convolutional neural networks”, Neurocomputing, Vol.257, pp.128–135, 2017
work page 2017
-
[37]
Binary coding of speech spectrograms using a deep auto-encoder
L. Deng, M.L. Seltzer, D. Yu, et al. , “Binary coding of speech spectrograms using a deep auto-encoder”, 11th Annual Con- ference of the International Speech Communication Associa- tion,Makuhari,Japan,2010
work page 2010
-
[38]
A shallow net- work with combined pooling for fast traffic sign recogni- tion
J.M. Zhang, Q.Q. Huang, H.L. Wu, et al. , “A shallow net- work with combined pooling for fast traffic sign recogni- tion”,Information, Vol.8, No.2, pp.45, 2017
work page 2017
-
[39]
How does batch normalization help optimization?
S. Santurkar, D. Tsipras, A. Ilyas, et al. , “How does batch normalization help optimization?”, Advances in Neural Infor- mation Processing Systems , Montreal,Canada, pp.2483–2493, 2018
work page 2018
-
[40]
Caffe: Convolu- tional architecture for fast feature embedding
Y.Q. Jia, E. Shelhamer, J. Donahue, et al. , “Caffe: Convolu- tional architecture for fast feature embedding”,Proc.of the 22nd ACM international conference on Multimedia , Orlando,Florida USA, pp.675–678, 2014
work page 2014
-
[41]
L.V.D. Maaten and G. Hinton, “Visualizing data using t-SNE”, Journal of machine learning research , Vol.9, No.Nov, pp.2579– 2605, 2008
work page 2008
-
[42]
F.N. Zhou, P. Hu, S. Yang, et al., “A multimodal feature fusion- based deep learning method for online fault diagnosis of rotating machinery”, Sensors, Vol.18, No.10, pp.3521, 2018
work page 2018
-
[43]
Research on com- bined intelligent fault diagnostic method based on CELCD and MFVPMCD
H.Y. Pan, J.D. Zheng, Y. Yang, et al. , “Research on com- bined intelligent fault diagnostic method based on CELCD and MFVPMCD”, Acta Electronica Sinica , Vol.45, No.3, pp.546– 551, 2017.(in Chinese)
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.