Towards a more realistic evaluation of machine learning models for bearing fault diagnosis

Danilo Silva; Jo\~ao Paulo Vieira; Rodrigo Kobashikawa Rosa; Victor Afonso Bauler

arxiv: 2509.22267 · v4 · pith:7YGU22KUnew · submitted 2025-09-26 · 💻 cs.LG · eess.SP

Towards a more realistic evaluation of machine learning models for bearing fault diagnosis

Jo\~ao Paulo Vieira , Victor Afonso Bauler , Rodrigo Kobashikawa Rosa , Danilo Silva This is my paper

Pith reviewed 2026-05-21 21:16 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords bearing fault diagnosisdata leakagemachine learning evaluationvibration signalsdataset partitioningmulti-label classificationgeneralization

0 comments

The pith

Common ways of splitting bearing vibration data let models train and test on the same physical bearings, creating leakage that inflates accuracy scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that segment-wise and condition-wise splits of bearing datasets allow models to exploit correlations from identical physical components appearing in both training and test data. This leakage produces performance numbers that do not reflect how the models would behave when deployed on new bearings. The proposed fix is a bearing-wise split that assigns entire bearings to either training or testing only, paired with a multi-label formulation of the fault task and ROC-based metrics that ignore fault prevalence. Experiments across four public datasets further indicate that the sheer number of distinct training bearings is the dominant factor controlling generalization. The work supplies concrete partitioning rules and validation practices intended to produce more trustworthy assessments of diagnostic models.

Core claim

The authors argue that leakage from non-bearing-wise partitions is the primary cause of overstated results in existing machine learning studies of bearing faults. They show that enforcing a partition by physical bearing, reformulating the problem as multi-label classification, and adopting prevalence-independent ROC metrics produces substantially lower yet more realistic performance figures, with the count of unique training bearings emerging as the decisive variable for generalization.

What carries the argument

Bearing-wise data partitioning that places all measurements from any single physical bearing exclusively into the training set or the test set.

If this is right

Reported accuracies for bearing fault models will drop once leakage is removed, but the remaining performance will be a better predictor of behavior on unseen equipment.
Collecting data from a larger number of distinct bearings becomes the most effective way to improve generalization rather than refining model architecture alone.
Multi-label classification allows simultaneous detection of co-occurring fault types that single-label setups miss.
ROC-based metrics give a clearer comparison across datasets that differ in fault prevalence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same leakage patterns probably appear in other sensor-based diagnosis tasks that reuse measurements from the same physical units.
Public benchmark datasets may need to be expanded with many more independent bearings before they can support claims of robust industrial performance.
Engineers may have to gather site-specific bearing data rather than rely solely on existing public collections for reliable deployment.

Load-bearing premise

The bearings within each dataset are sufficiently independent that removing any one of them from the training pool still leaves a representative range of fault behaviors for real-world use.

What would settle it

If models retrained under a strict bearing-wise split on the same four datasets recover the high accuracies previously reported in the literature, the claim that leakage was the main driver of inflated performance would be refuted.

Figures

Figures reproduced from arXiv: 2509.22267 by Danilo Silva, Jo\~ao Paulo Vieira, Rodrigo Kobashikawa Rosa, Victor Afonso Bauler.

**Figure 1.** Figure 1: Comparison of Decision Tree (DT) and Logistic Regression (LR) accuracy across varying numbers of training bearings, evaluated under two conditions: a leakage-free test (Valid) set and a test set with data leakage (Leakage). Hendriks et al. [5] proposed a bearing-wise splitting strategy on the CWRU dataset. Their methodology closely resembles ours, with one critical difference: healthy signals were included… view at source ↗

**Figure 2.** Figure 2: Exemplary bearing-level data partitioning for the generic dataset. The training set (green) and test set (blue) are disjoint at the bearing level, with a 3:2 allocation of bearings per health state. Under our multi-label framework, these states are represented by binary vectors, where a healthy bearing is encoded as [0,0], an inner race fault as [1,0], and an outer race fault as [0,1]. The partitioning of … view at source ↗

**Figure 3.** Figure 3: Specification of the generic bearing fault dataset, comprising 15 unique bearings, two fault modes (inner, outer), and two distinct acquisition configurations per bearing. datasets rarely mirror real-world conditions, where healthy states dominate until a fault develops. Secondly, the metric’s formulation assumes mutually exclusive classes, which inherently prevents the diagnosis of co-occurring faults. Cr… view at source ↗

**Figure 4.** Figure 4: Schematic of the Double Cross-Validation (CVM-CV) protocol applied to the UORED-VAFCLS dataset. A distinct set of 5 bearing-level splits is used for hyperparameter tuning, while a separate set of 100 splits is used for final performance evaluation. 4.2.2. Paderborn University (PU) Dataset The Paderborn University (PU) bearing dataset [16] represents a complex and widely-used benchmark, distinguished by it… view at source ↗

**Figure 5.** Figure 5: Schematic of the Double Cross-Validation (CVM-CV) protocol applied to the PU dataset. 4.2.3. CWRU bearing fault dataset The Case Western Reserve University (CWRU) bearing fault dataset [19] is a collection of experiments that involved a single pair of healthy bearings and several artificially created faulty bearings. The faults were created through electro-discharge machining, introducing point faults with… view at source ↗

**Figure 6.** Figure 6: Bearing configurations used in the CWRU dataset. Each cell represents a specific acquisition setup containing two bearings—one on the drive-end and one on the fan-end. Fault types are denoted as follows: I for inner race fault, O for outer race fault, B for ball fault, and H for healthy. However, this solution is not entirely foolproof, as the dataset contains a single healthy bearing configuration, which … view at source ↗

**Figure 7.** Figure 7: Illustration of the 3-fold split in the CWRU dataset following our proposed hyperparameter optimization methodology. Each cell contains two bearings: one on the drive-end and one on the fan-end. I I H I H O H O O H B H B H B D F D F D F I I H I O H O O H B B B D F D F D F 7 14 21 Fault size Accelerometers I H I I O O H O B B B H D F D F D F H I H I I H O O H O B H B H B D F D F D F Train with healthy signa… view at source ↗

**Figure 8.** Figure 8: Example of the two healthy-bearing split scenarios in our proposed methodology. was adopted for model selection using the CWRU dataset. To perform evaluation, we created 50 additional 2:1 splits following the same principles illustrated in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Impact of train-test split ratio on model performance on the UORED-VAFCLS dataset. The plot shows the mean Macro AUROC (and standard deviation as error bars) calculated over 100 evaluation splits for four distinct bearing-level train-to-test ratios [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Impact of train-test split ratio on model performance on the PU dataset using envelope spectrum as the input representation [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Impact of train-test split ratio on model performance on the CWRU dataset using Random Forest with handcrafted features. Although previous studies have investigated data leakage in bearing diagnosis, to the best of our knowledge, no prior work has done so solely by altering the test set. In our proposed experiments, we address this by keeping the training set—and thus the model—fixed, while modifying onl… view at source ↗

**Figure 12.** Figure 12: CWRU leakage experiment groups. The results are presented in [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

read the original abstract

Reliable detection of bearing faults is essential for maintaining the safety and operational efficiency of rotating machinery. While recent advances in machine learning (ML), particularly deep learning, have shown strong performance in controlled settings, many studies fail to generalize to real-world applications due to methodological flaws, most notably data leakage. This paper investigates the issue of data leakage in vibration-based bearing fault diagnosis and its impact on model evaluation. We demonstrate that common dataset partitioning strategies, such as segment-wise and condition-wise splits, introduce spurious correlations that inflate performance metrics. To address this, we propose a rigorous, leakage-free evaluation methodology centered on bearing-wise data partitioning, ensuring no overlap between the physical components used for training and testing. Additionally, we reformulate the classification task as a multi-label problem, enabling the detection of co-occurring fault types and the use of prevalence-independent metrics based on the ROC curve. Beyond preventing leakage, we also examine the effect of dataset diversity on generalization, showing that the number of unique training bearings is a decisive factor for achieving robust performance. We evaluate our methodology on four widely adopted datasets: CWRU, Paderborn University (PU), University of Ottawa (UORED-VAFCLS) and HUST bearing. This study highlights the importance of leakage-aware evaluation protocols and provides practical guidelines for dataset partitioning, model selection, and validation, fostering the development of more trustworthy ML systems for industrial fault diagnosis applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard splits in bearing fault datasets leak information across the same physical bearings and inflate results; bearing-wise splits expose the gap.

read the letter

The main thing to know is that common ways of splitting vibration data for bearing fault models let the same physical bearing appear in both training and test sets. This creates spurious correlations that make reported accuracy look stronger than it really is. The paper shows this pattern on CWRU, Paderborn, UORED-VAFCLS, and HUST, with clear drops once you enforce bearing-wise hold-outs instead of segment-wise or condition-wise splits. They also reframe the task as multi-label so models can flag co-occurring faults and switch to prevalence-independent ROC metrics. That combination is the concrete contribution here. They further check that performance scales with the number of distinct training bearings, which matches what you would expect for real generalization. The internal logic holds up without contradictions, and the proposed check directly targets the leakage source they describe. The soft spot is the assumption that holding out entire bearings leaves a task that still represents factory conditions; if bearings in a given dataset are already quite similar, the split might be stricter than deployment reality requires. The evidence for the size of the performance drops would be stronger with full tables and any significance checks, but the direction of the effect is consistent across the four datasets. This is useful for anyone who trains or reviews ML models for predictive maintenance on rotating equipment. Readers who care about evaluation protocols in industrial settings will take away practical partitioning guidelines. It deserves a serious referee because it flags a methodological issue that affects reliability claims in the area.

Referee Report

1 major / 3 minor

Summary. The manuscript claims that common dataset partitioning strategies such as segment-wise and condition-wise splits in vibration-based bearing fault diagnosis introduce spurious correlations and data leakage, inflating ML model performance metrics. It proposes a bearing-wise partitioning scheme that ensures no physical bearing overlap between training and test sets, reformulates the task as multi-label classification to handle co-occurring faults with prevalence-independent ROC metrics, and reports that performance drops under this split while improving as the number of unique training bearings increases. The approach is evaluated on the CWRU, Paderborn University (PU), UORED-VAFCLS, and HUST datasets.

Significance. If the empirical findings hold, the work is significant for promoting more trustworthy evaluation protocols in industrial ML applications. It directly targets a known source of over-optimism in the bearing fault diagnosis literature by demonstrating concrete performance gaps on four public datasets and by linking generalization to training-set diversity. The multi-label reformulation and emphasis on leakage-free splits provide actionable guidelines that could improve reproducibility and real-world applicability.

major comments (1)

The central claim that bearing-wise splits remove leakage rests on the assumption that physical bearings constitute the dominant source of independent variation; the manuscript should explicitly test or discuss whether shared manufacturing batches or sensor mounting effects across bearings could still induce correlations even after a bearing-wise split (see the weakest-assumption note in the stress test).

minor comments (3)

The abstract and experimental sections would benefit from reporting the exact train/test bearing counts and split ratios used for each of the four datasets to allow direct replication.
Clarify how the multi-label formulation handles cases where multiple fault types co-occur on the same bearing; an explicit label-encoding example or reference to the ROC implementation would improve clarity.
Figure captions and table headers should explicitly state whether the reported metrics are macro-averaged or micro-averaged ROC-AUC to avoid ambiguity.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their constructive feedback, which has helped us strengthen the discussion of assumptions underlying our proposed evaluation protocol. We address the major comment below and have made corresponding revisions to the manuscript.

read point-by-point responses

Referee: The central claim that bearing-wise splits remove leakage rests on the assumption that physical bearings constitute the dominant source of independent variation; the manuscript should explicitly test or discuss whether shared manufacturing batches or sensor mounting effects across bearings could still induce correlations even after a bearing-wise split (see the weakest-assumption note in the stress test).

Authors: We agree that physical bearings are not necessarily the only possible source of correlation and that factors such as shared manufacturing batches or sensor mounting could in principle induce residual dependencies. Our bearing-wise partitioning is designed to eliminate the most direct and commonly overlooked form of leakage—reusing multiple segments or operating conditions from the identical physical bearing—which the literature has shown to produce unrealistically high performance. In the revised manuscript we have added an explicit paragraph in the Discussion section acknowledging this limitation of the assumption, noting that the public datasets (CWRU, PU, UORED-VAFCLS, HUST) lack batch-level or mounting metadata that would allow an empirical stress test of these secondary effects. We therefore treat the bearing-wise split as a necessary but not always sufficient condition for leakage-free evaluation and recommend that future dataset releases include such metadata. This addition does not change our empirical results but clarifies the scope of the claims. revision: partial

standing simulated objections not resolved

Explicit empirical testing of manufacturing-batch or sensor-mounting correlations is not possible with the current public datasets because they do not release the required metadata.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is an empirical comparison of dataset partitioning schemes on four public bearing-fault datasets. Its central claims rest on measured performance differences between segment-wise, condition-wise, and bearing-wise splits, together with a multi-label ROC reformulation; none of these quantities are defined in terms of parameters fitted from the same evaluation data, nor do they reduce to self-citations or imported uniqueness theorems. The argument is therefore self-contained against external benchmarks and contains no load-bearing step that collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that bearings constitute independent units for leakage purposes and on standard supervised-learning assumptions about i.i.d. data after proper splitting.

axioms (1)

domain assumption Vibration signals from distinct physical bearings are independent enough that placing all data from one bearing entirely in train or test removes spurious correlations.
This premise underpins the recommendation to use bearing-wise partitioning instead of segment-wise or condition-wise splits.

pith-pipeline@v0.9.0 · 5794 in / 1264 out tokens · 57147 ms · 2026-05-21T21:16:07.156534+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a rigorous, leakage-free evaluation methodology centered on bearing-wise data partitioning, ensuring no overlap between the physical components used for training and testing. Additionally, we reformulate the classification task as a multi-label problem, enabling the detection of co-occurring fault types and the use of prevalence-independent metrics such as Macro AUROC.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the number of unique training bearings is a decisive factor for achieving robust performance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, A. K. Nandi, Applications of machine learning to machine fault diagnosis: A review and roadmap, Mechanical Systems and Signal Processing 138 (2020) 106587

work page 2020
[2]

Kapoor, E

S. Kapoor, E. M. Cantrell, K. Peng, T. H. Pham, C. A. Bail, O. E. Gundersen, J. M. Hofman, J. Hullman, M. A. Lones, M. M. Malik, P. Nanayakkara, R. A. Poldrack, I. D. Raji, M. Roberts, M. J. Salganik, M. Serra-Garcia, B. M. Stewart, G. Vandewiele, A. Narayanan, Reforms: Consensus-based recommendations for machine-learning-based science, Science Advances 1...

work page doi:10.1126/sciadv.adk3452 2024
[3]

Kapoor, A

S. Kapoor, A. Narayanan, Leakage and the reproducibility crisis in machine-learning-based science, Patterns 4 (9) (2023) 100804.doi: https://doi.org/10.1016/j.patter.2023.100804. URL https://www.sciencedirect.com/science/article/pii/S2666389923001599

work page doi:10.1016/j.patter.2023.100804 2023
[4]

doi:10.1016/j.eswa.2020

T.W.Rauber,A.L.daSilvaLoca,F.d.A.Boldt,A.L.Rodrigues,F.M.Varejão,Anexperimentalmethodologytoevaluatemachinelearning methodsforfaultdiagnosisbasedonvibrationsignals,ExpertSystemswithApplications167(2021)114022. doi:10.1016/j.eswa.2020. 114022

work page doi:10.1016/j.eswa.2020 2021
[5]

Hendriks, P

J. Hendriks, P. Dumond, D. Knox, Towards better benchmarking using the CWRU bearing fault dataset, Mechanical Systems and Signal Processing 169 (2022) 108732

work page 2022
[6]

Abburi, T

H. Abburi, T. Chaudhary, S. H. W. Ilyas, L. Manne, D. Mittal, D. Williams, D. Snaidauf, E. Bowen, B. Veeramani, A closer look at bearing fault classification approaches, in: Annual Conference of the PHM Society, Vol. 15, 2023

work page 2023
[7]

D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera-Arroita, S. Hauenstein, J. J. Lahoz-Monfort, B. Schröder, W. Thuiller, D. I. Warton, B. A. Wintle, F. Hartig, C. F. Dormann, Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography 40 (8) (2017) 913–929.doi:10.1111/ecog.02881

work page doi:10.1111/ecog.02881 2017
[8]

Passos, D

F.Pedregosa,G.Varoquaux,A.Gramfort,V.Michel,B.Thirion,O.Grisel,M.Blondel,P.Prettenhofer,R.Weiss,V.Dubourg,J.Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830

work page 2011
[9]

doi: 10.1145/2020408.2020496

S. Kaufman, S. Rosset, C. Perlich, Leakage in data mining: Formulation, detection, and avoidance, Vol. 6, 2011, pp. 556–563.doi: 10.1145/2020408.2020496

work page doi:10.1145/2020408.2020496 2011
[10]

arXiv:2108.02497

M.A.Lones,Howtoavoidmachinelearningpitfalls:aguideforacademicresearchers,CoRRabs/2108.02497(2021). arXiv:2108.02497. URL https://arxiv.org/abs/2108.02497

work page arXiv 2021
[11]

M. A. Lones, Avoiding common machine learning pitfalls, Patterns (2024) 101046doi:10.1016/j.patter.2024.101046

work page doi:10.1016/j.patter.2024.101046 2024
[12]

I. M. D. S. Varejão, L. G. D. O. Costa, L. H. P. D. Silva, A. Rodrigues, M. P. Ribeiro, F. M. Varejão, T. Oliveira-Santos, The similarity bias problem: What it is and how it impacts vibration based intelligent fault diagnosis, Mechanical Systems and Signal Processing 235 (2025) 112822, publisher: Elsevier BV.doi:10.1016/j.ymssp.2025.112822. URL https://li...

work page doi:10.1016/j.ymssp.2025.112822 2025
[13]

Wheat, M

L. Wheat, M. V. Mohrenschildt, S. Habibi, D. Al-Ani, Impact of Data Leakage in Vibration Signals Used for Bearing Fault Diagnosis, IEEEAccess12(2024)169879–169895,publisher:InstituteofElectricalandElectronicsEngineers(IEEE). doi:10.1109/access.2024. 3497716. URL https://ieeexplore.ieee.org/document/10752530/

work page doi:10.1109/access.2024 2024
[14]

D. Wang, Y. Li, L. Jia, Y. Song, Y. Liu, Novel three-stage feature fusion method of multimodal data for bearing fault diagnosis, IEEE Transactions on Instrumentation and Measurement 70 (2021) 1–10.doi:10.1109/TIM.2021.3071232

work page doi:10.1109/tim.2021.3071232 2021
[15]

D.Wang,Y.Li,L.Jia,Y.Song,T.Wen,Attention-basedbilinearfeaturefusionmethodforbearingfaultdiagnosis,IEEE/ASMETransactions on Mechatronics 28 (3) (2023) 1695–1705.doi:10.1109/TMECH.2022.3223358

work page doi:10.1109/tmech.2022.3223358 2023
[16]

Lessmeier, J

C. Lessmeier, J. K. Kimotho, D. Zimmer, W. Sextro, Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification, in: PHM society European conference, Vol. 3, 2016

work page 2016
[17]

Tsamardinos, A

I. Tsamardinos, A. Rakhshani, V. Lagani, Performance-estimation properties of cross-validation-based protocols with simultaneous hyper- parameter optimization, International Journal on Artificial Intelligence Tools 24 (05) (2015) 1540023

work page 2015
[18]

URL https://linkinghub.elsevier.com/retrieve/pii/S2352340923004456

M.Sehri,P.Dumond,M.Bouchard,UniversityofOttawaconstantloadandspeedrolling-elementbearingvibrationandacousticfaultsignature datasets, Data in Brief 49 (2023) 109327, publisher: Elsevier BV.doi:10.1016/j.dib.2023.109327. URL https://linkinghub.elsevier.com/retrieve/pii/S2352340923004456

work page doi:10.1016/j.dib.2023.109327 2023
[19]

URL https://www.sciencedirect.com/science/article/pii/S0888327015002034

W.A.Smith,R.B.Randall,Rollingelementbearingdiagnosticsusingthecasewesternreserveuniversitydata:Abenchmarkstudy,Mechanical Systems and Signal Processing 64-65 (2015) 100–131.doi:https://doi.org/10.1016/j.ymssp.2015.04.021. URL https://www.sciencedirect.com/science/article/pii/S0888327015002034

work page doi:10.1016/j.ymssp.2015.04.021 2015
[20]

González, V

M. González, V. G. Díaz, B. L. Pérez, B. C. P. G-Bustelo, J. P. Anzola, Bearing fault diagnosis with envelope analysis and machine learning approaches using cwru dataset, IEEE Access 11 (2023) 57796–57805.doi:10.1109/ACCESS.2023.3283466

work page doi:10.1109/access.2023.3283466 2023
[21]

URL https://www.mdpi.com/1424-8220/17/2/425

W.Zhang,G.Peng,C.Li,Y.Chen,Z.Zhang,ANewDeepLearningModelforFaultDiagnosiswithGoodAnti-NoiseandDomainAdaptation Ability on Raw Vibration Signals, Sensors 17 (2) (2017) 425, publisher: MDPI AG.doi:10.3390/s17020425. URL https://www.mdpi.com/1424-8220/17/2/425

work page doi:10.3390/s17020425 2017
[22]

J. Jiao, M. Zhao, J. Lin, C. Ding, Deep Coupled Dense Convolutional Network With Complementary Data for Intelligent Fault Diagnosis, IEEE Transactions on Industrial Electronics 66 (12) (2019) 9858–9867, publisher: Institute of Electrical and Electronics Engineers (IEEE). J. P. Vieira et al.:Preprint submitted to Elsevier Page 24 of 25 doi:10.1109/tie.2019...

work page doi:10.1109/tie.2019.2902817 2019
[23]

275–283.doi:10.1109/ icdmw51313.2020.00046

J.VanDenHoogen,S.Bloemheuvel,M.Atzmueller,AnImprovedWide-KernelCNNforClassifyingMultivariateSignalsinFaultDiagnosis, in: 2020 International Conference on Data Mining Workshops (ICDMW), IEEE, Sorrento, Italy, 2020, pp. 275–283.doi:10.1109/ icdmw51313.2020.00046. URL https://ieeexplore.ieee.org/document/9346555/

work page arXiv 2020
[24]

Q. Wei, Y. Liu, X. Ruan, A report on audio tagging with deeper cnn, 1d-convnet and 2d-convnet, DCASE, 2018. URL https://dcase.community/documents/challenge2018/technical_reports/DCASE2018_WEI_53.pdf

work page 2018
[25]

Tchatchoua, G

P. Tchatchoua, G. Graton, M. Ouladsine, J.-F. Christaud, Application of 1D ResNet for Multivariate Fault Detection on Semiconductor Manufacturing Equipment, Sensors 23 (22) (2023) 9099, publisher: MDPI AG.doi:10.3390/s23229099. URL https://www.mdpi.com/1424-8220/23/22/9099

work page doi:10.3390/s23229099 2023
[26]

Z. Yan, H. Liu, SMoCo: A Powerful and Efficient Method Based on Self-Supervised Learning for Fault Diagnosis of Aero-Engine Bearing under Limited Data, Mathematics 10 (15) (2022) 2796, publisher: MDPI AG.doi:10.3390/math10152796. URL https://www.mdpi.com/2227-7390/10/15/2796 J. P. Vieira et al.:Preprint submitted to Elsevier Page 25 of 25

work page doi:10.3390/math10152796 2022

[1] [1]

Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, A. K. Nandi, Applications of machine learning to machine fault diagnosis: A review and roadmap, Mechanical Systems and Signal Processing 138 (2020) 106587

work page 2020

[2] [2]

Kapoor, E

S. Kapoor, E. M. Cantrell, K. Peng, T. H. Pham, C. A. Bail, O. E. Gundersen, J. M. Hofman, J. Hullman, M. A. Lones, M. M. Malik, P. Nanayakkara, R. A. Poldrack, I. D. Raji, M. Roberts, M. J. Salganik, M. Serra-Garcia, B. M. Stewart, G. Vandewiele, A. Narayanan, Reforms: Consensus-based recommendations for machine-learning-based science, Science Advances 1...

work page doi:10.1126/sciadv.adk3452 2024

[3] [3]

Kapoor, A

S. Kapoor, A. Narayanan, Leakage and the reproducibility crisis in machine-learning-based science, Patterns 4 (9) (2023) 100804.doi: https://doi.org/10.1016/j.patter.2023.100804. URL https://www.sciencedirect.com/science/article/pii/S2666389923001599

work page doi:10.1016/j.patter.2023.100804 2023

[4] [4]

doi:10.1016/j.eswa.2020

T.W.Rauber,A.L.daSilvaLoca,F.d.A.Boldt,A.L.Rodrigues,F.M.Varejão,Anexperimentalmethodologytoevaluatemachinelearning methodsforfaultdiagnosisbasedonvibrationsignals,ExpertSystemswithApplications167(2021)114022. doi:10.1016/j.eswa.2020. 114022

work page doi:10.1016/j.eswa.2020 2021

[5] [5]

Hendriks, P

J. Hendriks, P. Dumond, D. Knox, Towards better benchmarking using the CWRU bearing fault dataset, Mechanical Systems and Signal Processing 169 (2022) 108732

work page 2022

[6] [6]

Abburi, T

H. Abburi, T. Chaudhary, S. H. W. Ilyas, L. Manne, D. Mittal, D. Williams, D. Snaidauf, E. Bowen, B. Veeramani, A closer look at bearing fault classification approaches, in: Annual Conference of the PHM Society, Vol. 15, 2023

work page 2023

[7] [7]

D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera-Arroita, S. Hauenstein, J. J. Lahoz-Monfort, B. Schröder, W. Thuiller, D. I. Warton, B. A. Wintle, F. Hartig, C. F. Dormann, Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography 40 (8) (2017) 913–929.doi:10.1111/ecog.02881

work page doi:10.1111/ecog.02881 2017

[8] [8]

Passos, D

F.Pedregosa,G.Varoquaux,A.Gramfort,V.Michel,B.Thirion,O.Grisel,M.Blondel,P.Prettenhofer,R.Weiss,V.Dubourg,J.Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830

work page 2011

[9] [9]

doi: 10.1145/2020408.2020496

S. Kaufman, S. Rosset, C. Perlich, Leakage in data mining: Formulation, detection, and avoidance, Vol. 6, 2011, pp. 556–563.doi: 10.1145/2020408.2020496

work page doi:10.1145/2020408.2020496 2011

[10] [10]

arXiv:2108.02497

M.A.Lones,Howtoavoidmachinelearningpitfalls:aguideforacademicresearchers,CoRRabs/2108.02497(2021). arXiv:2108.02497. URL https://arxiv.org/abs/2108.02497

work page arXiv 2021

[11] [11]

M. A. Lones, Avoiding common machine learning pitfalls, Patterns (2024) 101046doi:10.1016/j.patter.2024.101046

work page doi:10.1016/j.patter.2024.101046 2024

[12] [12]

I. M. D. S. Varejão, L. G. D. O. Costa, L. H. P. D. Silva, A. Rodrigues, M. P. Ribeiro, F. M. Varejão, T. Oliveira-Santos, The similarity bias problem: What it is and how it impacts vibration based intelligent fault diagnosis, Mechanical Systems and Signal Processing 235 (2025) 112822, publisher: Elsevier BV.doi:10.1016/j.ymssp.2025.112822. URL https://li...

work page doi:10.1016/j.ymssp.2025.112822 2025

[13] [13]

Wheat, M

L. Wheat, M. V. Mohrenschildt, S. Habibi, D. Al-Ani, Impact of Data Leakage in Vibration Signals Used for Bearing Fault Diagnosis, IEEEAccess12(2024)169879–169895,publisher:InstituteofElectricalandElectronicsEngineers(IEEE). doi:10.1109/access.2024. 3497716. URL https://ieeexplore.ieee.org/document/10752530/

work page doi:10.1109/access.2024 2024

[14] [14]

D. Wang, Y. Li, L. Jia, Y. Song, Y. Liu, Novel three-stage feature fusion method of multimodal data for bearing fault diagnosis, IEEE Transactions on Instrumentation and Measurement 70 (2021) 1–10.doi:10.1109/TIM.2021.3071232

work page doi:10.1109/tim.2021.3071232 2021

[15] [15]

D.Wang,Y.Li,L.Jia,Y.Song,T.Wen,Attention-basedbilinearfeaturefusionmethodforbearingfaultdiagnosis,IEEE/ASMETransactions on Mechatronics 28 (3) (2023) 1695–1705.doi:10.1109/TMECH.2022.3223358

work page doi:10.1109/tmech.2022.3223358 2023

[16] [16]

Lessmeier, J

C. Lessmeier, J. K. Kimotho, D. Zimmer, W. Sextro, Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification, in: PHM society European conference, Vol. 3, 2016

work page 2016

[17] [17]

Tsamardinos, A

I. Tsamardinos, A. Rakhshani, V. Lagani, Performance-estimation properties of cross-validation-based protocols with simultaneous hyper- parameter optimization, International Journal on Artificial Intelligence Tools 24 (05) (2015) 1540023

work page 2015

[18] [18]

URL https://linkinghub.elsevier.com/retrieve/pii/S2352340923004456

M.Sehri,P.Dumond,M.Bouchard,UniversityofOttawaconstantloadandspeedrolling-elementbearingvibrationandacousticfaultsignature datasets, Data in Brief 49 (2023) 109327, publisher: Elsevier BV.doi:10.1016/j.dib.2023.109327. URL https://linkinghub.elsevier.com/retrieve/pii/S2352340923004456

work page doi:10.1016/j.dib.2023.109327 2023

[19] [19]

URL https://www.sciencedirect.com/science/article/pii/S0888327015002034

W.A.Smith,R.B.Randall,Rollingelementbearingdiagnosticsusingthecasewesternreserveuniversitydata:Abenchmarkstudy,Mechanical Systems and Signal Processing 64-65 (2015) 100–131.doi:https://doi.org/10.1016/j.ymssp.2015.04.021. URL https://www.sciencedirect.com/science/article/pii/S0888327015002034

work page doi:10.1016/j.ymssp.2015.04.021 2015

[20] [20]

González, V

M. González, V. G. Díaz, B. L. Pérez, B. C. P. G-Bustelo, J. P. Anzola, Bearing fault diagnosis with envelope analysis and machine learning approaches using cwru dataset, IEEE Access 11 (2023) 57796–57805.doi:10.1109/ACCESS.2023.3283466

work page doi:10.1109/access.2023.3283466 2023

[21] [21]

URL https://www.mdpi.com/1424-8220/17/2/425

W.Zhang,G.Peng,C.Li,Y.Chen,Z.Zhang,ANewDeepLearningModelforFaultDiagnosiswithGoodAnti-NoiseandDomainAdaptation Ability on Raw Vibration Signals, Sensors 17 (2) (2017) 425, publisher: MDPI AG.doi:10.3390/s17020425. URL https://www.mdpi.com/1424-8220/17/2/425

work page doi:10.3390/s17020425 2017

[22] [22]

J. Jiao, M. Zhao, J. Lin, C. Ding, Deep Coupled Dense Convolutional Network With Complementary Data for Intelligent Fault Diagnosis, IEEE Transactions on Industrial Electronics 66 (12) (2019) 9858–9867, publisher: Institute of Electrical and Electronics Engineers (IEEE). J. P. Vieira et al.:Preprint submitted to Elsevier Page 24 of 25 doi:10.1109/tie.2019...

work page doi:10.1109/tie.2019.2902817 2019

[23] [23]

275–283.doi:10.1109/ icdmw51313.2020.00046

J.VanDenHoogen,S.Bloemheuvel,M.Atzmueller,AnImprovedWide-KernelCNNforClassifyingMultivariateSignalsinFaultDiagnosis, in: 2020 International Conference on Data Mining Workshops (ICDMW), IEEE, Sorrento, Italy, 2020, pp. 275–283.doi:10.1109/ icdmw51313.2020.00046. URL https://ieeexplore.ieee.org/document/9346555/

work page arXiv 2020

[24] [24]

Q. Wei, Y. Liu, X. Ruan, A report on audio tagging with deeper cnn, 1d-convnet and 2d-convnet, DCASE, 2018. URL https://dcase.community/documents/challenge2018/technical_reports/DCASE2018_WEI_53.pdf

work page 2018

[25] [25]

Tchatchoua, G

P. Tchatchoua, G. Graton, M. Ouladsine, J.-F. Christaud, Application of 1D ResNet for Multivariate Fault Detection on Semiconductor Manufacturing Equipment, Sensors 23 (22) (2023) 9099, publisher: MDPI AG.doi:10.3390/s23229099. URL https://www.mdpi.com/1424-8220/23/22/9099

work page doi:10.3390/s23229099 2023

[26] [26]

Z. Yan, H. Liu, SMoCo: A Powerful and Efficient Method Based on Self-Supervised Learning for Fault Diagnosis of Aero-Engine Bearing under Limited Data, Mathematics 10 (15) (2022) 2796, publisher: MDPI AG.doi:10.3390/math10152796. URL https://www.mdpi.com/2227-7390/10/15/2796 J. P. Vieira et al.:Preprint submitted to Elsevier Page 25 of 25

work page doi:10.3390/math10152796 2022