Classification of systolic murmurs in heart sounds using multiresolution complex Gabor dictionary and vision transformer
Pith reviewed 2026-05-10 08:59 UTC · model grok-4.3
The pith
Projecting heart sound segments onto a shared multiresolution complex Gabor dictionary and classifying the resulting features with a vision transformer identifies four systolic murmur types at 95.96 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that enforcing a shared multiresolution complex Gabor dictionary when projecting multiple segments of a single recording produces consistent variable-resolution time-frequency feature matrices; feeding those matrices into a vision transformer that tokenizes each resolution separately, concatenates the embeddings, and applies multi-head attention with a 1x1 convolutional residual block yields reliable separation of four systolic murmur classes, demonstrated by 95.96 percent accuracy on the CirCor DigiScope dataset.
What carries the argument
A redundant dictionary of multiresolution complex Gabor basis functions whose projection weights, obtained under a shared-dictionary constraint via complex orthogonal matching pursuit, are reshaped into variable-resolution time-frequency matrices that serve as the multi-input to the vision transformer.
If this is right
- Enforcing a shared dictionary across multiple segments of one recording reduces the impact of natural murmur variability on the final feature matrices.
- Splitting and reshaping the projection weights into several resolution-specific matrices lets the classifier capture both fine-grained and coarse time-frequency structure.
- Concatenating patch tokens from separate convolutional front-ends before the transformer encoder allows a single attention layer to integrate information across resolutions.
- The reported 95.96 percent accuracy on four murmur types indicates that the combined sparse-dictionary and transformer pipeline can distinguish clinically relevant systolic patterns without hand-crafted acoustic descriptors.
Where Pith is reading between the lines
- The same shared-dictionary strategy could be tested on diastolic murmurs or other transient cardiac sounds where segment-to-segment consistency is also an issue.
- Replacing the vision transformer with a lighter sequence model might reveal how much of the performance gain comes from the multiresolution Gabor front-end versus the attention mechanism.
- Cross-device validation on recordings from consumer-grade digital stethoscopes would directly test whether the learned basis functions transfer beyond the original dataset hardware.
- If the approach holds, it could be embedded in portable auscultation apps to flag murmur subtypes for remote review by cardiologists.
Load-bearing premise
The CirCor DigiScope recordings already contain enough real-world clinical variability that the shared-dictionary constraint will keep the extracted features discriminative when the same model encounters new patients, different recording devices, or unseen noise conditions.
What would settle it
Accuracy falling below 90 percent when the identical pipeline is evaluated on an independent collection of heart-sound recordings made with different stethoscopes or from patient populations absent from the original training set.
Figures
read the original abstract
Systolic murmurs are extra heart sounds that occur during the contraction phase of the cardiac cycle, often indicating heart abnormalities caused by turbulent blood flow. Their intensity, pitch, and quality vary, requiring precise identification for the accurate diagnosis of cardiac disorders. This study presents an automatic classification system for systolic murmurs using a feature extraction module, followed by a classification model. The feature extraction module employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant dictionary composed of multiresolution complex Gabor basis functions (GBFs). The resulting projection weights are split and reshaped into variable-resolution time--frequency feature matrices. Processing multiple segments of a single recording using a shared dictionary mitigates murmur variability. This is achieved by learning the weights for each segment while enforcing that they correspond to the same set of basis functions in the dictionary, promoting consistent time--frequency feature matrices. The classification model is built based on a vision transformer to process multiple input matrices of different resolutions by passing each through a convolutional neural network for patch tokenization. All embedding tokens are then concatenated to form a matrix and forwarded to an encoder layer that includes multihead attention, residual connections, and a convolutional network with a kernel size of one. This integration of multiresolution feature extraction with transformer-based feature classification enhances the accuracy and reliability of heart murmur identification. An experimental analysis of four types of systolic murmurs from the CirCor DigiScope dataset demonstrates the effectiveness of the system, achieving a classification accuracy of $95.96\%$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an automatic classification system for four types of systolic murmurs from heart sounds. It employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant multiresolution complex Gabor dictionary, yielding sparse coefficients that are split and reshaped into variable-resolution time-frequency feature matrices. A shared dictionary is enforced across segments of the same recording to promote consistency in the extracted features. These matrices are fed to a vision transformer that tokenizes each resolution level via a CNN, concatenates the resulting embeddings, and processes them through an encoder layer incorporating multi-head attention, residual connections, and a 1x1 convolutional network. An experimental analysis on the CirCor DigiScope dataset reports a classification accuracy of 95.96%.
Significance. If the performance claims hold under rigorous validation, the work could contribute to automated cardiac auscultation by showing how multiresolution sparse time-frequency representations can be integrated with a transformer to manage intra-recording variability. The shared-dictionary constraint in the OMP stage is a reasonable design choice for enforcing consistency, and the multi-resolution ViT handling is a plausible way to fuse features at different scales. However, the absence of standard experimental controls substantially reduces the immediate significance of the reported accuracy.
major comments (2)
- [Abstract] Abstract: The central claim of 95.96% classification accuracy on four systolic-murmur classes is stated without any description of the train-test split, cross-validation strategy, baseline comparisons, statistical significance testing, or class-imbalance handling on the CirCor DigiScope dataset. These details are load-bearing for evaluating whether the empirical result supports the effectiveness of the multiresolution Gabor + ViT pipeline.
- [Experimental Analysis] Experimental Analysis section: No information is given on whether recordings were partitioned at the patient level or whether the shared dictionary was learned in a way that could introduce leakage across segments of the same patient. This directly affects the weakest assumption that the features remain discriminative for new patients, devices, or noise conditions.
minor comments (2)
- [Feature Extraction Module] The description of how the projection weights are split and reshaped into variable-resolution time-frequency matrices would benefit from an explicit equation or pseudocode to clarify the matrix dimensions and resolution levels.
- [Classification Model] The vision transformer encoder is described at a high level; a diagram showing the CNN tokenization per resolution, concatenation, and single encoder layer would improve clarity of the multi-resolution fusion.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a standard empirical ML pipeline: complex OMP projects heart-sound segments onto a fixed multiresolution complex Gabor dictionary to produce sparse coefficients that are reshaped into time-frequency matrices; a shared dictionary is used across segments of one recording; these matrices are tokenized by CNNs and fed to a single-layer vision transformer. The 95.96% accuracy is obtained by training and testing this architecture on the CirCor DigiScope dataset. No equation reduces the reported accuracy to a fitted constant, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in via prior work. The central claim is therefore an experimental result rather than an algebraic identity or self-referential definition.
Axiom & Free-Parameter Ledger
free parameters (2)
- Number and choice of resolution levels in the Gabor dictionary
- Vision transformer hyperparameters (patch size, number of heads, layers)
axioms (1)
- domain assumption Heart-sound segments admit a sparse representation in the multiresolution complex Gabor dictionary.
Reference graph
Works this paper leans on
-
[1]
A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
World Health Organization, Cardiovascular Diseases CVDs, 2021. A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
work page 2021
-
[2]
H. Jung, L. S. Lilly, The cardiac cycle: Mechanisms of heart sounds and murmurs, in Electronic Research Archive Volume 34, Issue 3, xxx–xxx 25 Pathophysiology of Heart Disease: A Collaborative Project of Medical Students and Faculty, 5th edition, Chapter 2, Philadelphia, (2011), 28–53
work page 2011
-
[3]
B. Boashash, time–frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edition, Academic Press, Oxford, 2015
work page 2015
-
[4]
S. Mallat, Z. Zhang, Matching pursuit with time–frequency dictionaries, IEEE Trans. Signal Process., 41 (1993), 3397–3415. https://doi.org/10.1109/78.258082
-
[5]
X. Zhang, L. G. Durand, L. Senhadji, H. C. Lee, J. L. Coatrieux, Analysis-synthesis of the phonocardiogram based on the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 962–971. https://doi.org/10.1109/10.704865
-
[6]
X. Zhang, L. Durand, L. Senhadji, H. Lee, J. L. Coatrieux, time–frequency scaling transfor- mation of the phonocardiogram based of the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 972–979. https://doi.org/10.1109/10.704866
-
[7]
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2017
work page 2017
-
[8]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, (2017), 5998–6008
work page 2017
-
[9]
Y. Wang, Y. Deng, Y. Zheng, P. Chattopadhyay, L. Wang, Vision transform- ers for image classification: A comparative survey, Technologies, 13 (2025), 32. https://doi.org/10.3390/technologies13010032
-
[10]
J. Oliveira, F. Renna, P. D. Costa, M. Nogueira, C. Oliveira, C. Ferreira, et al., The circor digiscope dataset: From murmur detection to murmur classification, IEEE J. Biomed. Health. Inf., 26 (2022), 2524–2535. https://doi.org/10.1109/JBHI.2021.3137048
-
[11]
R. O. Bonow, D. L. Mann, D. P. Zipes, P. Libby, Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine, 9th edition, Elsevier Health Sciences, Philadelphia, 2011
work page 2011
-
[12]
J. Singh, R. Anand, Computer aided analysis of phonocardiogram, J. Med. Eng. Technol., 31 (2007), 319–323. https://doi.org/10.1080/03091900500282772
-
[13]
Y. Zheng, X. Guo, X. Ding, A novel hybrid energy fraction and entropy-based ap- proach for systolic heart murmurs identification, Expert Syst. Appl., 42 (2015), 2710–2721. https://doi.org/10.1016/j.eswa.2014.10.051
-
[14]
P. D. Stein, H. N. Sabbah, J. B. Lakier, S. R. Kemp, D. J. Magilligan, Frequency content of heart sounds and systolic murmurs in patients with porcine bioprosthetic valves: Diagnostic value for the early detection of valvular degeneration, Henry Ford Hosp. Med. J., 30 (1982), 119–123
work page 1982
-
[15]
Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998
M. Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998
work page 1998
-
[16]
M. Fakhry, A. Gallardo-Antolín, Variational mode decomposition and a light cnn- lstm model for classification of heart sound signals, in IEEE EUROCON 2023 - 20th International Conference on Smart Technologies, Torino, Italy, (2023), 295–300. https://doi.org/10.1109/EUROCON56442.2023.10199054 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 26
-
[17]
N. Atanasov, T. Ning, Isolation of systolic heart murmurs using wavelet transform and energy index, in 2008 Congress on Image and Signal Processing, Sanya, China, (2008), 216–
work page 2008
-
[18]
https://doi.org/10.1109/CISP.2008.758
-
[19]
A. Haghighi-Mood, N. Torry, time–frequency analysis of systolic murmurs, in Computers in Cardiology 1997, IEEE, (1997), 113–116. https://doi.org/10.1109/CIC.1997.647843
-
[20]
J. G. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orien- tation optimized by two-dimensional visual cortical filters, J. Opt. Soc. Am. A, 2 (1985), 1160–1169. https://doi.org/10.1364/JOSAA.2.001160
-
[21]
M. Fakhry, A. F. Brery, A. Gallardo-Antolín, Analysis of heart sound signals using sparse modeling with gabor dictionary, in 2022 IEEE International Symposium on Multimedia (ISM), Italy, (2022), 92–96. https://doi.org/10.1109/ISM55400.2022.00021
-
[22]
M. Fakhry, A. Gallardo-Antolín, Elastic net regularization and gabor dictionary for clas- sification of heart sound signals using deep learning, Eng. Appl. Artif. Intell., 127 (2024), 107406. https://doi.org/10.1016/j.engappai.2023.107406
-
[23]
M. Fakhry, A. Gallardo-Antolín, Analysis of systolic murmurs in heart sounds using multiresolution complex gabor dictionary, in 2024 International Conference on Computer and Applications (ICCA), IEEE, Cairo, Egypt, (2024), 783–787. https://doi.org/10.1109/ICCA62237.2024.10927981
-
[24]
S. Jabbari, H. Ghassemian, Modeling of heart systolic murmurs based on multivariate match- ing pursuit for diagnosis of valvular disorders, Comput. Biol. Med., 41 (2011), 802–811. https://doi.org/10.1016/j.compbiomed.2011.06.016
-
[25]
M. Shabbir, X. Liu, M. Nasseri, S. Helgeson, Heart murmur classification in phonocardio- gram representations using convolutional neural networks, in The International FLAIRS Conference Proceedings, 36 (2023). https://doi.org/10.32473/flairs.36.133189
-
[26]
C. Yin, Y. Zheng, X. Ding, Y. Shi, J. Qin, X. Guo, Detection of coronary artery disease based on clinical phonocardiogram and multiscale attention convolu- tional compression network, IEEE J. Biomed. Health. Inf., 28 (2024), 1353–1362. https://doi.org/10.1109/JBHI.2024.3354832
-
[27]
J. Kim, G. Park, B. Suh, Classification of phonocardiogram recordings using vision trans- former architecture, in 2022 Computing in Cardiology (CinC), IEEE, Tampere, Finland, 498 (2022), 1-4. https://doi.org/10.22489/CinC.2022.084
-
[28]
Z. Liu, H. Jiang, F. W. Zhang, W. B. Ouyang, X. Li, X. Pan, Heart sound classification based on bispectrum features and vision transformer mode, Alexandria Eng. J., 85 (2023), 49–59. https://doi.org/10.1016/j.aej.2023.11.035
-
[29]
J. Han, A. Shaout, Enact-heart – ensemble-based assessment using cnn and transformer on heart sounds, preprint, arXiv:2502.16914. https://doi.org/10.48550/arXiv.2502.16914
-
[30]
W. Zhao, H. Ma, N. Jin, Y. Zheng, X. Guo, Detection of coronary heart disease based on heart sound and hybrid vision transformer, Appl. Acoust., 230 (2025), 110420. https://doi.org/10.1016/j.apacoust.2024.110420 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 27
-
[31]
R. Wang, Y. Duan, Y. Li, D. Zheng, X. Liu, C. T. Lam, et al., Pctmf-net: heart sound clas- sification with parallel CNNs-transformer and second-order spectral analysis, Vis. Comput., 39 (2023), 3811–3822. https://doi.org/10.1007/s00371-023-03031-5
-
[32]
S. Qiu, H. G. Feichtinger, Discrete gabor structures and optimal representations, IEEE Trans. Signal Process., 43 (1995), 2258–2268. https://doi.org/10.1109/78.469862
-
[33]
I. Rish, G. Grabarnik, Sparse Modeling: Theory, Algorithms, and Applications, CRC Press, 2014
work page 2014
-
[34]
Z. Průša, N. Holighaus, P. Balázs, Fast matching pursuit with multi-Gabor dictionaries, ACM Trans. Math. Software, 47 (2021), 1–20. https://doi.org/10.1145/3447958
- [35]
-
[36]
W. Dai, O. Milenkovic, Subspace pursuit for compressive sensing signal reconstruction, IEEE Trans. Inf. Theory, 55 (2008), 2230–2249. https://doi.org/10.1109/TIT.2009.2016006
-
[37]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770– 778. © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons....
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.