pith. sign in

arxiv: 2509.03070 · v4 · submitted 2025-09-03 · 📡 eess.SP · cs.AI· cs.CV· cs.LG· eess.IV

CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO

Pith reviewed 2026-05-18 19:41 UTC · model grok-4.3

classification 📡 eess.SP cs.AIcs.CVcs.LGeess.IV
keywords continuous wavelet transformbearing fault detectionvibration sensingYOLO object detectionspectrogram analysisspatial localizationnon-stationary signals
0
0 comments X

The pith

Transforming bearing vibration signals into CWT spectrograms lets YOLO models localize faults with mAP scores up to 99.5 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that continuous wavelet transform spectrograms make weak and non-stationary fault patterns in vibration data more observable than raw time series or short-time Fourier transforms. By treating these spectrograms as images and using YOLO object detectors to find and classify the regions of high energy associated with faults, the method achieves strong performance on three public bearing datasets. This matters for industrial monitoring because accurate early detection of bearing faults can prevent costly equipment breakdowns in machines that run continuously. The spatial localization adds interpretability by tying specific patterns in the time-frequency plane to known fault types like inner race or outer race defects.

Core claim

The paper claims that a framework using CWT to generate spectrograms from vibration signals, followed by YOLOv9, YOLOv10, or YOLOv11 for detecting fault-related energy regions, improves the detectability and robustness of bearing fault sensing. This yields mAP values of up to 99.4% on the CWRU dataset, 97.8% on the PU dataset, and 99.5% on the IMS dataset, outperforming time-series models, modern vision backbones, and STFT representations while providing an interpretable link between energy distributions and fault characteristics.

What carries the argument

CWT spectrograms fed into YOLO models for spatial localization of fault energy regions on the time-frequency plane.

If this is right

  • Enhances observability of weak and non-stationary fault signatures in vibration signals.
  • Delivers high mean average precision on standard bearing fault datasets CWRU, PU, and IMS.
  • Supplies region-aware localization that connects time-frequency energy patterns to specific bearing fault types.
  • Provides a generalizable method for vibration-based fault monitoring in changing conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending this approach to other types of rotating machinery could improve fault detection in gearboxes or pumps.
  • Integrating the system with real-time data streams might support online predictive maintenance applications.
  • Testing alternative wavelet bases or scales could reveal even better representations for particular fault frequencies.

Load-bearing premise

Converting vibration signals to CWT spectrograms substantially improves the visibility of weak and non-stationary fault signatures compared with direct time-series input or STFT spectrograms.

What would settle it

Running the same YOLO detectors on STFT spectrograms or raw signal plots from the CWRU, PU, and IMS datasets and obtaining equal or higher mAP scores would indicate that the performance gains are not specifically due to the CWT transformation.

Figures

Figures reproduced from arXiv: 2509.03070 by Chun-Yu Yeh, Jen-Yu Chiu, Po-Heng Chou, Ru-Ping Lin, Wei-Lung Mao.

Figure 1
Figure 1. Figure 1: Sample vibration signals for four bearing conditions: [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CWT-based spectrograms for four bearing conditions: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

This letter presents a CWT-enhanced vibration sensing framework for bearing fault monitoring through spatial localization on time-frequency spectrograms. Vibration signals are transformed into continuous wavelet transform (CWT) spectrograms to improve the observability of weak and non-stationary fault signatures, and YOLOv9, YOLOv10, and YOLOv11 are employed to localize and identify fault-related energy regions. Experiments on the CWRU, PU, and IMS datasets show that the proposed framework improves the detectability and robustness of fault-related sensing patterns compared with conventional time-series models, modern vision backbones, and short-time Fourier transform (STFT)-based representations, achieving mAP values up to 99.4%, 97.8%, and 99.5%, respectively. In addition, the region-aware localization provides a more interpretable connection between time-frequency energy distributions and bearing fault characteristics. These results demonstrate that spatial localization on CWT spectrograms offers an effective and generalizable approach for enhancing vibration sensing capability in non-stationary environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes transforming bearing vibration signals into continuous wavelet transform (CWT) spectrograms and applying YOLOv9, YOLOv10, and YOLOv11 detectors to localize and classify fault-related energy regions on these 2D representations. Experiments on the CWRU, PU, and IMS datasets report mAP values up to 99.4%, 97.8%, and 99.5%, claiming superior detectability and robustness compared to conventional time-series models, modern vision backbones, and STFT-based representations, along with improved interpretability via spatial localization.

Significance. If the performance gains can be robustly attributed to the CWT representation and the spatial localization approach, the work could advance vibration-based fault monitoring by improving observability of weak, non-stationary signatures through time-frequency imaging combined with object detection. This offers a potentially more interpretable alternative to 1D time-series analysis for industrial bearing diagnostics.

major comments (2)
  1. [Abstract and experimental results] Abstract and experimental results: The manuscript reports high mAP values but provides no details on train-test splits, hyperparameter search procedures, or confirmation that data exclusions and augmentations were applied uniformly across all compared methods (time-series models, vision backbones, and STFT representations). This leaves open the possibility of post-hoc selection effects that could affect the validity of the cross-method comparisons.
  2. [Comparison to STFT-based representations] Comparison to STFT-based representations: The central claim that CWT spectrograms meaningfully enhance observability of weak and non-stationary fault signatures (relative to STFT) and drive the reported mAP gains requires a controlled ablation that holds the YOLO detector, training protocol, data splits, and hyperparameters fixed while swapping only the time-frequency transform. Without this, the performance delta cannot be isolated from the general advantage of applying spatial object detection to any 2D energy map.
minor comments (1)
  1. [Method] Clarify the exact YOLO variants and any modifications made to the standard architectures or loss functions when applied to spectrogram inputs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and have revised the manuscript to enhance experimental transparency and isolate the contribution of the CWT representation.

read point-by-point responses
  1. Referee: [Abstract and experimental results] Abstract and experimental results: The manuscript reports high mAP values but provides no details on train-test splits, hyperparameter search procedures, or confirmation that data exclusions and augmentations were applied uniformly across all compared methods (time-series models, vision backbones, and STFT representations). This leaves open the possibility of post-hoc selection effects that could affect the validity of the cross-method comparisons.

    Authors: We agree that the original manuscript did not provide sufficient detail on the experimental protocol, which is necessary for validating the cross-method comparisons. In the revised version, we have added a new subsection titled 'Experimental Protocol' that explicitly describes the train-test splits (stratified 70/30 split per dataset with fixed random seed for reproducibility), the hyperparameter search procedure (systematic grid search over learning rate, batch size, optimizer, and epochs with 5-fold cross-validation), and confirmation that data exclusions and augmentations were applied identically to all baselines, including time-series models, vision backbones, and STFT representations. These additions address concerns about post-hoc selection effects. revision: yes

  2. Referee: [Comparison to STFT-based representations] Comparison to STFT-based representations: The central claim that CWT spectrograms meaningfully enhance observability of weak and non-stationary fault signatures (relative to STFT) and drive the reported mAP gains requires a controlled ablation that holds the YOLO detector, training protocol, data splits, and hyperparameters fixed while swapping only the time-frequency transform. Without this, the performance delta cannot be isolated from the general advantage of applying spatial object detection to any 2D energy map.

    Authors: We concur that a strictly controlled ablation is required to isolate the effect of the time-frequency representation. While the original experiments applied the same YOLO detectors across representations, we have now conducted and included an additional controlled ablation study in the revised manuscript. This study fixes the detector (YOLOv11), training protocol, data splits, and all hyperparameters, varying only the transform (CWT versus STFT). The new results, reported in an expanded Table and accompanying text, show consistent mAP gains for CWT (average +2.1% across the three datasets), supporting the claim that CWT improves observability of weak non-stationary signatures beyond the general benefit of 2D detection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from standard supervised training

full rationale

The paper describes a standard ML pipeline: vibration signals are converted to CWT spectrograms and fed to off-the-shelf YOLO detectors for object detection on labeled data. Reported mAP values (99.4% CWRU, 97.8% PU, 99.5% IMS) are direct outputs of supervised training and evaluation on fixed benchmark splits. No equations, fitted parameters, or self-citations are presented that would make any claimed improvement equivalent to its own inputs by construction. The framework contains no derivation chain that collapses to a self-definition or renamed fit; performance numbers are externally falsifiable via replication on the same public datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that CWT spectrograms make weak fault signatures more observable than time-series or STFT inputs; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption CWT spectrograms improve observability of weak and non-stationary fault signatures compared with conventional representations
    Directly stated in the abstract as the motivation for the transformation step.

pith-pipeline@v0.9.0 · 5735 in / 1367 out tokens · 39136 ms · 2026-05-18T19:41:20.012999+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Deep learning algorithms for bearing fault diagnostics—a comprehensive review,

    S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learning algorithms for bearing fault diagnostics—a comprehensive review,” IEEE Access, vol. 8, pp. 29857–29881, Feb. 2020

  2. [2]

    Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,

    D. Neupane and J. Seok, “Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,”IEEE Access, vol. 8, pp. 93155–93178, Jun. 2020

  3. [3]

    Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,

    M. Xia, T. Li, L. Xu, L. Liu, and C. W. de Silva, “Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,”IEEE/ASME Trans. Mechatronics, vol. 23, no. 1, pp. 101– 110, Feb. 2018

  4. [4]

    The wigner distribution: A tool for time-frequency signal analysis,

    T. A. C. M. Claasen and W. F. G. Mecklenbrauker, “The wigner distribution: A tool for time-frequency signal analysis,”Philips J. Res., vol. 35, no. 3, pp. 217–250, 1980

  5. [5]

    The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,

    N. E. Huanget al., “The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,”Proc. Roy. Soc. Lond. A Math. Phys. Eng. Sci., vol. 454, no. 1971, pp. 903– 995, Mar. 1998. 5

  6. [6]

    Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,

    H. Li, “Bearing fault diagnosis based on time scale spectrum of continuous wavelet transform,” inProc. 2011 8th Int. Conf. Fuzzy Syst. Knowl. Discov. (FSKD), vol. 3, Shanghai, China, Jul. 2011, pp. 1934– 1937

  7. [7]

    Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,

    A. R. Patil, S. Buchaiah, and P. Shakya, “Combined VMD-Morlet wavelet filter based signal de-noising approach and its applications in bearing fault diagnosis,”J. Vib. Eng. Technol., vol. 12, pp. 7929–7953, 2024

  8. [8]

    An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,

    H. Pan, X. He, S. Tang, and F. Meng, “An improved bearing fault diagnosis method using one-dimensional CNN and LSTM,”J. Mech. Eng., vol. 64, no. 7–8, pp. 443–452, May 2018

  9. [9]

    Bearing fault diagnosis based on multi-scale CNN and LSTM model,

    X. Chen, B. Zhang, and D. Gao, “Bearing fault diagnosis based on multi-scale CNN and LSTM model,”J. Intell. Manuf., vol. 32, no. 4, pp. 971–987, Jun. 2021

  10. [10]

    Hakim, A

    M. Hakim, A. A. B. Omran, A. N. Ahmed, M. Al-Waily, and A. Abdellatif, “A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations,”Ain Shams Eng. J., vol. 14, no. 4, p. 101945, Apr. 2023

  11. [11]

    YOLOv9: Learning what you want to learn using programmable gradient information,

    C.-Y . Wang, I.-H. Yeh, and H.-Y . M. Liao, “YOLOv9: Learning what you want to learn using programmable gradient information,” inProc. Eur . Conf. Comput. Vis. (ECCV), Cham: Springer, Oct. 2025, pp. 1–21

  12. [12]

    YOLOv10: Real-time end-to-end object detection,

    A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “YOLOv10: Real-time end-to-end object detection,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, Dec. 2024, pp. 107984–108011

  13. [13]

    YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

    Ultralytics, “YOLOv11: Real-time object detection with enhanced fea- ture extraction,”arXiv preprint arXiv:2501.13400, Jan. 2025

  14. [14]

    Bearing data center,

    Case Western Reserve University Bearing Data Center, “Bearing data center,” 2020. [Online]. Available: https://engineering.case.edu/ bearingdatacenter. Accessed: May 8, 2025

  15. [15]

    Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,

    C. Lessmeier, J. K. Kimotho, D. Zimmer, and W. Sextro, “Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data,” in Proc. Eur . Conf. Progn. Health Manage. Soc., Bilbao, Spain, Jul. 2016, pp. 1–8

  16. [16]

    Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,

    W. Gousseau, J. Antoni, F. Girardin, and J. Griffaton, “Analysis of the rolling element bearing data set of the center for intelligent maintenance systems of the university of cincinnati,”Surveillance, Feb. 2018