CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO

Chun-Yu Yeh; Jen-Yu Chiu; Po-Heng Chou; Ru-Ping Lin; Wei-Lung Mao

arxiv: 2509.03070 · v4 · submitted 2025-09-03 · 📡 eess.SP · cs.AI· cs.CV· cs.LG· eess.IV

CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO

Po-Heng Chou , Wei-Lung Mao , Ru-Ping Lin , Jen-Yu Chiu , Chun-Yu Yeh This is my paper

Pith reviewed 2026-05-18 19:41 UTC · model grok-4.3

classification 📡 eess.SP cs.AIcs.CVcs.LGeess.IV

keywords continuous wavelet transformbearing fault detectionvibration sensingYOLO object detectionspectrogram analysisspatial localizationnon-stationary signals

0 comments

The pith

Transforming bearing vibration signals into CWT spectrograms lets YOLO models localize faults with mAP scores up to 99.5 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that continuous wavelet transform spectrograms make weak and non-stationary fault patterns in vibration data more observable than raw time series or short-time Fourier transforms. By treating these spectrograms as images and using YOLO object detectors to find and classify the regions of high energy associated with faults, the method achieves strong performance on three public bearing datasets. This matters for industrial monitoring because accurate early detection of bearing faults can prevent costly equipment breakdowns in machines that run continuously. The spatial localization adds interpretability by tying specific patterns in the time-frequency plane to known fault types like inner race or outer race defects.

Core claim

The paper claims that a framework using CWT to generate spectrograms from vibration signals, followed by YOLOv9, YOLOv10, or YOLOv11 for detecting fault-related energy regions, improves the detectability and robustness of bearing fault sensing. This yields mAP values of up to 99.4% on the CWRU dataset, 97.8% on the PU dataset, and 99.5% on the IMS dataset, outperforming time-series models, modern vision backbones, and STFT representations while providing an interpretable link between energy distributions and fault characteristics.

What carries the argument

CWT spectrograms fed into YOLO models for spatial localization of fault energy regions on the time-frequency plane.

If this is right

Enhances observability of weak and non-stationary fault signatures in vibration signals.
Delivers high mean average precision on standard bearing fault datasets CWRU, PU, and IMS.
Supplies region-aware localization that connects time-frequency energy patterns to specific bearing fault types.
Provides a generalizable method for vibration-based fault monitoring in changing conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending this approach to other types of rotating machinery could improve fault detection in gearboxes or pumps.
Integrating the system with real-time data streams might support online predictive maintenance applications.
Testing alternative wavelet bases or scales could reveal even better representations for particular fault frequencies.

Load-bearing premise

Converting vibration signals to CWT spectrograms substantially improves the visibility of weak and non-stationary fault signatures compared with direct time-series input or STFT spectrograms.

What would settle it

Running the same YOLO detectors on STFT spectrograms or raw signal plots from the CWRU, PU, and IMS datasets and obtaining equal or higher mAP scores would indicate that the performance gains are not specifically due to the CWT transformation.

Figures

Figures reproduced from arXiv: 2509.03070 by Chun-Yu Yeh, Jen-Yu Chiu, Po-Heng Chou, Ru-Ping Lin, Wei-Lung Mao.

**Figure 2.** Figure 2: CWT-based spectrograms for four bearing conditions: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

This letter presents a CWT-enhanced vibration sensing framework for bearing fault monitoring through spatial localization on time-frequency spectrograms. Vibration signals are transformed into continuous wavelet transform (CWT) spectrograms to improve the observability of weak and non-stationary fault signatures, and YOLOv9, YOLOv10, and YOLOv11 are employed to localize and identify fault-related energy regions. Experiments on the CWRU, PU, and IMS datasets show that the proposed framework improves the detectability and robustness of fault-related sensing patterns compared with conventional time-series models, modern vision backbones, and short-time Fourier transform (STFT)-based representations, achieving mAP values up to 99.4%, 97.8%, and 99.5%, respectively. In addition, the region-aware localization provides a more interpretable connection between time-frequency energy distributions and bearing fault characteristics. These results demonstrate that spatial localization on CWT spectrograms offers an effective and generalizable approach for enhancing vibration sensing capability in non-stationary environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CWT plus YOLO yields high mAP on public bearing datasets but the gains are not yet isolated from the choice of 2D representation or training details.

read the letter

The core of this paper is straightforward: take vibration signals from the CWRU, PU, and IMS sets, convert them to CWT spectrograms, and run YOLOv9/10/11 to localize fault energy patches, reaching mAP up to 99.4 percent. That pairing is new enough in the condition-monitoring literature to be worth noting, and the authors show that the resulting boxes give a readable link between time-frequency concentrations and known fault types. The comparisons to plain time-series classifiers and to STFT versions are also useful for practitioners who already work with spectrograms. Those parts are executed cleanly enough on standard public data to deserve a look. The soft spot is the missing control that would actually test the central claim. The abstract states that CWT improves observability of weak non-stationary signatures over STFT, yet nothing in the reported experiments holds the detector, splits, augmentations, and hyperparameters fixed while swapping only the transform. Without that ablation the performance delta could come from the general benefit of treating any 2D energy map as an object-detection problem or from dataset-specific tuning. The paper also does not spell out whether the same exclusion rules or hyperparameter search were applied uniformly to every baseline. This is the kind of applied work that industrial monitoring groups might try out, especially if they already have labeled spectrograms and want localization on top of classification. A reader who needs a quick pipeline on these exact datasets will get value; someone looking for a new theoretical handle on non-stationary signals will not. I would send it to peer review. The datasets are public, the numbers are high, and the localization output is a concrete addition, so referees can ask for the missing ablation and implementation details without wasting time on a fundamentally broken idea.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes transforming bearing vibration signals into continuous wavelet transform (CWT) spectrograms and applying YOLOv9, YOLOv10, and YOLOv11 detectors to localize and classify fault-related energy regions on these 2D representations. Experiments on the CWRU, PU, and IMS datasets report mAP values up to 99.4%, 97.8%, and 99.5%, claiming superior detectability and robustness compared to conventional time-series models, modern vision backbones, and STFT-based representations, along with improved interpretability via spatial localization.

Significance. If the performance gains can be robustly attributed to the CWT representation and the spatial localization approach, the work could advance vibration-based fault monitoring by improving observability of weak, non-stationary signatures through time-frequency imaging combined with object detection. This offers a potentially more interpretable alternative to 1D time-series analysis for industrial bearing diagnostics.

major comments (2)

[Abstract and experimental results] Abstract and experimental results: The manuscript reports high mAP values but provides no details on train-test splits, hyperparameter search procedures, or confirmation that data exclusions and augmentations were applied uniformly across all compared methods (time-series models, vision backbones, and STFT representations). This leaves open the possibility of post-hoc selection effects that could affect the validity of the cross-method comparisons.
[Comparison to STFT-based representations] Comparison to STFT-based representations: The central claim that CWT spectrograms meaningfully enhance observability of weak and non-stationary fault signatures (relative to STFT) and drive the reported mAP gains requires a controlled ablation that holds the YOLO detector, training protocol, data splits, and hyperparameters fixed while swapping only the time-frequency transform. Without this, the performance delta cannot be isolated from the general advantage of applying spatial object detection to any 2D energy map.

minor comments (1)

[Method] Clarify the exact YOLO variants and any modifications made to the standard architectures or loss functions when applied to spectrogram inputs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and have revised the manuscript to enhance experimental transparency and isolate the contribution of the CWT representation.

read point-by-point responses

Referee: [Abstract and experimental results] Abstract and experimental results: The manuscript reports high mAP values but provides no details on train-test splits, hyperparameter search procedures, or confirmation that data exclusions and augmentations were applied uniformly across all compared methods (time-series models, vision backbones, and STFT representations). This leaves open the possibility of post-hoc selection effects that could affect the validity of the cross-method comparisons.

Authors: We agree that the original manuscript did not provide sufficient detail on the experimental protocol, which is necessary for validating the cross-method comparisons. In the revised version, we have added a new subsection titled 'Experimental Protocol' that explicitly describes the train-test splits (stratified 70/30 split per dataset with fixed random seed for reproducibility), the hyperparameter search procedure (systematic grid search over learning rate, batch size, optimizer, and epochs with 5-fold cross-validation), and confirmation that data exclusions and augmentations were applied identically to all baselines, including time-series models, vision backbones, and STFT representations. These additions address concerns about post-hoc selection effects. revision: yes
Referee: [Comparison to STFT-based representations] Comparison to STFT-based representations: The central claim that CWT spectrograms meaningfully enhance observability of weak and non-stationary fault signatures (relative to STFT) and drive the reported mAP gains requires a controlled ablation that holds the YOLO detector, training protocol, data splits, and hyperparameters fixed while swapping only the time-frequency transform. Without this, the performance delta cannot be isolated from the general advantage of applying spatial object detection to any 2D energy map.

Authors: We concur that a strictly controlled ablation is required to isolate the effect of the time-frequency representation. While the original experiments applied the same YOLO detectors across representations, we have now conducted and included an additional controlled ablation study in the revised manuscript. This study fixes the detector (YOLOv11), training protocol, data splits, and all hyperparameters, varying only the transform (CWT versus STFT). The new results, reported in an expanded Table and accompanying text, show consistent mAP gains for CWT (average +2.1% across the three datasets), supporting the claim that CWT improves observability of weak non-stationary signatures beyond the general benefit of 2D detection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from standard supervised training

full rationale

The paper describes a standard ML pipeline: vibration signals are converted to CWT spectrograms and fed to off-the-shelf YOLO detectors for object detection on labeled data. Reported mAP values (99.4% CWRU, 97.8% PU, 99.5% IMS) are direct outputs of supervised training and evaluation on fixed benchmark splits. No equations, fitted parameters, or self-citations are presented that would make any claimed improvement equivalent to its own inputs by construction. The framework contains no derivation chain that collapses to a self-definition or renamed fit; performance numbers are externally falsifiable via replication on the same public datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that CWT spectrograms make weak fault signatures more observable than time-series or STFT inputs; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption CWT spectrograms improve observability of weak and non-stationary fault signatures compared with conventional representations
Directly stated in the abstract as the motivation for the transformation step.

pith-pipeline@v0.9.0 · 5735 in / 1367 out tokens · 39136 ms · 2026-05-18T19:41:20.012999+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Vibration signals are transformed into continuous wavelet transform (CWT) spectrograms ... YOLOv9, YOLOv10, and YOLOv11 are employed to localize and identify fault-related energy regions.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The CWT of a signal x(t) is defined as ... we adopt the Morlet wavelet

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Deep learning algorithms for bearing fault diagnostics—a comprehensive review,

S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learning algorithms for bearing fault diagnostics—a comprehensive review,” IEEE Access, vol. 8, pp. 29857–29881, Feb. 2020

work page 2020
[2]

Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,

D. Neupane and J. Seok, “Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,”IEEE Access, vol. 8, pp. 93155–93178, Jun. 2020

work page 2020
[3]

Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,

M. Xia, T. Li, L. Xu, L. Liu, and C. W. de Silva, “Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,”IEEE/ASME Trans. Mechatronics, vol. 23, no. 1, pp. 101– 110, Feb. 2018

work page 2018
[4]

The wigner distribution: A tool for time-frequency signal analysis,

T. A. C. M. Claasen and W. F. G. Mecklenbrauker, “The wigner distribution: A tool for time-frequency signal analysis,”Philips J. Res., vol. 35, no. 3, pp. 217–250, 1980

work page 1980
[5]

The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,

N. E. Huanget al., “The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,”Proc. Roy. Soc. Lond. A Math. Phys. Eng. Sci., vol. 454, no. 1971, pp. 903– 995, Mar. 1998. 5

work page 1971
[6]

Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,

H. Li, “Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,” inProc. 2011 8th Int. Conf. Fuzzy Syst. Knowl. Discov. (FSKD), vol. 3, Shanghai, China, Jul. 2011, pp. 1934– 1937

work page 2011
[7]

Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,

A. R. Patil, S. Buchaiah, and P. Shakya, “Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,”J. Vib. Eng. Technol., vol. 12, pp. 7929–7953, 2024

work page 2024
[8]

An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,

H. Pan, X. He, S. Tang, and F. Meng, “An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,”J. Mech. Eng., vol. 64, no. 7–8, pp. 443–452, May 2018

work page 2018
[9]

Bearing fault diagnosis based on multi-scale CNN and LSTM model,

X. Chen, B. Zhang, and D. Gao, “Bearing fault diagnosis based on multi-scale CNN and LSTM model,”J. Intell. Manuf., vol. 32, no. 4, pp. 971–987, Jun. 2021

work page 2021
[10]

Hakim, A

M. Hakim, A. A. B. Omran, A. N. Ahmed, M. Al-Waily, and A. Abdellatif, “A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations,”Ain Shams Eng. J., vol. 14, no. 4, p. 101945, Apr. 2023

work page 2023
[11]

YOLOv9: Learning what you want to learn using programmable gradient information,

C.-Y . Wang, I.-H. Yeh, and H.-Y . M. Liao, “YOLOv9: Learning what you want to learn using programmable gradient information,” inProc. Eur . Conf. Comput. Vis. (ECCV), Cham: Springer, Oct. 2025, pp. 1–21

work page 2025
[12]

YOLOv10: Real-time end-to-end object detection,

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “YOLOv10: Real-time end-to-end object detection,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, Dec. 2024, pp. 107984–108011

work page 2024
[13]

YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

Ultralytics, “YOLOv11: Real-time object detection with enhanced fea- ture extraction,”arXiv preprint arXiv:2501.13400, Jan. 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Bearing data center,

Case Western Reserve University Bearing Data Center, “Bearing data center,” 2020. [Online]. Available: https://engineering.case.edu/ bearingdatacenter. Accessed: May 8, 2025

work page 2020
[15]

Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,

C. Lessmeier, J. K. Kimotho, D. Zimmer, and W. Sextro, “Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,” in Proc. Eur . Conf. Progn. Health Manage. Soc., Bilbao, Spain, Jul. 2016, pp. 1–8

work page 2016
[16]

Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,

W. Gousseau, J. Antoni, F. Girardin, and J. Griffaton, “Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,”Surveillance, Feb. 2018

work page 2018

[1] [1]

Deep learning algorithms for bearing fault diagnostics—a comprehensive review,

S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learning algorithms for bearing fault diagnostics—a comprehensive review,” IEEE Access, vol. 8, pp. 29857–29881, Feb. 2020

work page 2020

[2] [2]

Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,

D. Neupane and J. Seok, “Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,”IEEE Access, vol. 8, pp. 93155–93178, Jun. 2020

work page 2020

[3] [3]

Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,

M. Xia, T. Li, L. Xu, L. Liu, and C. W. de Silva, “Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,”IEEE/ASME Trans. Mechatronics, vol. 23, no. 1, pp. 101– 110, Feb. 2018

work page 2018

[4] [4]

The wigner distribution: A tool for time-frequency signal analysis,

T. A. C. M. Claasen and W. F. G. Mecklenbrauker, “The wigner distribution: A tool for time-frequency signal analysis,”Philips J. Res., vol. 35, no. 3, pp. 217–250, 1980

work page 1980

[5] [5]

The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,

N. E. Huanget al., “The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,”Proc. Roy. Soc. Lond. A Math. Phys. Eng. Sci., vol. 454, no. 1971, pp. 903– 995, Mar. 1998. 5

work page 1971

[6] [6]

Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,

H. Li, “Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,” inProc. 2011 8th Int. Conf. Fuzzy Syst. Knowl. Discov. (FSKD), vol. 3, Shanghai, China, Jul. 2011, pp. 1934– 1937

work page 2011

[7] [7]

Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,

A. R. Patil, S. Buchaiah, and P. Shakya, “Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,”J. Vib. Eng. Technol., vol. 12, pp. 7929–7953, 2024

work page 2024

[8] [8]

An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,

H. Pan, X. He, S. Tang, and F. Meng, “An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,”J. Mech. Eng., vol. 64, no. 7–8, pp. 443–452, May 2018

work page 2018

[9] [9]

Bearing fault diagnosis based on multi-scale CNN and LSTM model,

X. Chen, B. Zhang, and D. Gao, “Bearing fault diagnosis based on multi-scale CNN and LSTM model,”J. Intell. Manuf., vol. 32, no. 4, pp. 971–987, Jun. 2021

work page 2021

[10] [10]

Hakim, A

M. Hakim, A. A. B. Omran, A. N. Ahmed, M. Al-Waily, and A. Abdellatif, “A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations,”Ain Shams Eng. J., vol. 14, no. 4, p. 101945, Apr. 2023

work page 2023

[11] [11]

YOLOv9: Learning what you want to learn using programmable gradient information,

C.-Y . Wang, I.-H. Yeh, and H.-Y . M. Liao, “YOLOv9: Learning what you want to learn using programmable gradient information,” inProc. Eur . Conf. Comput. Vis. (ECCV), Cham: Springer, Oct. 2025, pp. 1–21

work page 2025

[12] [12]

YOLOv10: Real-time end-to-end object detection,

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “YOLOv10: Real-time end-to-end object detection,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, Dec. 2024, pp. 107984–108011

work page 2024

[13] [13]

YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

Ultralytics, “YOLOv11: Real-time object detection with enhanced fea- ture extraction,”arXiv preprint arXiv:2501.13400, Jan. 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Bearing data center,

Case Western Reserve University Bearing Data Center, “Bearing data center,” 2020. [Online]. Available: https://engineering.case.edu/ bearingdatacenter. Accessed: May 8, 2025

work page 2020

[15] [15]

Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,

C. Lessmeier, J. K. Kimotho, D. Zimmer, and W. Sextro, “Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,” in Proc. Eur . Conf. Progn. Health Manage. Soc., Bilbao, Spain, Jul. 2016, pp. 1–8

work page 2016

[16] [16]

Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,

W. Gousseau, J. Antoni, F. Girardin, and J. Griffaton, “Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,”Surveillance, Feb. 2018

work page 2018