Classification of systolic murmurs in heart sounds using multiresolution complex Gabor dictionary and vision transformer

Abeer FathAllah Brery; Mahmoud Fakhry

arxiv: 2604.16563 · v1 · submitted 2026-04-17 · 💻 cs.CV · cs.AI

Classification of systolic murmurs in heart sounds using multiresolution complex Gabor dictionary and vision transformer

Mahmoud Fakhry , Abeer FathAllah Brery This is my paper

Pith reviewed 2026-05-10 08:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords systolic murmursheart sound classificationGabor dictionaryorthogonal matching pursuitvision transformertime-frequency featurescardiac signal processing

0 comments

The pith

Projecting heart sound segments onto a shared multiresolution complex Gabor dictionary and classifying the resulting features with a vision transformer identifies four systolic murmur types at 95.96 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an automatic classifier for systolic murmurs, extra heart sounds during the contraction phase that often point to turbulent blood flow and valve problems. It first extracts time-frequency features by applying complex orthogonal matching pursuit to one or more segments of a recording against a redundant dictionary of multiresolution complex Gabor basis functions, with the key constraint that all segments share the same dictionary atoms so the derived feature matrices remain consistent despite natural murmur variation. These variable-resolution matrices are tokenized by separate convolutional networks, concatenated, and passed through a transformer encoder that uses multi-head attention and residual connections. On the CirCor DigiScope collection the method reaches 95.96 percent accuracy across four murmur categories, which would matter for supporting faster and more consistent identification of cardiac abnormalities during routine listening exams.

Core claim

The central claim is that enforcing a shared multiresolution complex Gabor dictionary when projecting multiple segments of a single recording produces consistent variable-resolution time-frequency feature matrices; feeding those matrices into a vision transformer that tokenizes each resolution separately, concatenates the embeddings, and applies multi-head attention with a 1x1 convolutional residual block yields reliable separation of four systolic murmur classes, demonstrated by 95.96 percent accuracy on the CirCor DigiScope dataset.

What carries the argument

A redundant dictionary of multiresolution complex Gabor basis functions whose projection weights, obtained under a shared-dictionary constraint via complex orthogonal matching pursuit, are reshaped into variable-resolution time-frequency matrices that serve as the multi-input to the vision transformer.

If this is right

Enforcing a shared dictionary across multiple segments of one recording reduces the impact of natural murmur variability on the final feature matrices.
Splitting and reshaping the projection weights into several resolution-specific matrices lets the classifier capture both fine-grained and coarse time-frequency structure.
Concatenating patch tokens from separate convolutional front-ends before the transformer encoder allows a single attention layer to integrate information across resolutions.
The reported 95.96 percent accuracy on four murmur types indicates that the combined sparse-dictionary and transformer pipeline can distinguish clinically relevant systolic patterns without hand-crafted acoustic descriptors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shared-dictionary strategy could be tested on diastolic murmurs or other transient cardiac sounds where segment-to-segment consistency is also an issue.
Replacing the vision transformer with a lighter sequence model might reveal how much of the performance gain comes from the multiresolution Gabor front-end versus the attention mechanism.
Cross-device validation on recordings from consumer-grade digital stethoscopes would directly test whether the learned basis functions transfer beyond the original dataset hardware.
If the approach holds, it could be embedded in portable auscultation apps to flag murmur subtypes for remote review by cardiologists.

Load-bearing premise

The CirCor DigiScope recordings already contain enough real-world clinical variability that the shared-dictionary constraint will keep the extracted features discriminative when the same model encounters new patients, different recording devices, or unseen noise conditions.

What would settle it

Accuracy falling below 90 percent when the identical pipeline is evaluated on an independent collection of heart-sound recordings made with different stethoscopes or from patient populations absent from the original training set.

Figures

Figures reproduced from arXiv: 2604.16563 by Abeer FathAllah Brery, Mahmoud Fakhry.

**Figure 2.** Figure 2: Shapes of different types of systolic heart murmurs in the time domain. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Block diagram of the developed system. leveraged various signal representations, including spectrograms, Mel-frequency cepstral coefficients, and short-time Fourier transforms. A multiscale attention convolutional compression network was proposed in [25] to detect coronary artery disease. The network uses a multiscale convolution structure to capture comprehensive features and a channel attention module … view at source ↗

**Figure 4.** Figure 4: Block diagram of the developed feature extraction module, mainly built [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Feature extraction from a murmur segment of diamond type using COMP [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Feature extraction from two murmur segments from the same heart sound [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of complex atoms with three different resolutions for [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Block diagram of the developed vision transformer encoder-based model with [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Block diagram of the multihead attention with [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: The evolution of the training and validation loss and accuracy across epochs. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

read the original abstract

Systolic murmurs are extra heart sounds that occur during the contraction phase of the cardiac cycle, often indicating heart abnormalities caused by turbulent blood flow. Their intensity, pitch, and quality vary, requiring precise identification for the accurate diagnosis of cardiac disorders. This study presents an automatic classification system for systolic murmurs using a feature extraction module, followed by a classification model. The feature extraction module employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant dictionary composed of multiresolution complex Gabor basis functions (GBFs). The resulting projection weights are split and reshaped into variable-resolution time--frequency feature matrices. Processing multiple segments of a single recording using a shared dictionary mitigates murmur variability. This is achieved by learning the weights for each segment while enforcing that they correspond to the same set of basis functions in the dictionary, promoting consistent time--frequency feature matrices. The classification model is built based on a vision transformer to process multiple input matrices of different resolutions by passing each through a convolutional neural network for patch tokenization. All embedding tokens are then concatenated to form a matrix and forwarded to an encoder layer that includes multihead attention, residual connections, and a convolutional network with a kernel size of one. This integration of multiresolution feature extraction with transformer-based feature classification enhances the accuracy and reliability of heart murmur identification. An experimental analysis of four types of systolic murmurs from the CirCor DigiScope dataset demonstrates the effectiveness of the system, achieving a classification accuracy of $95.96\%$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports 95.96% accuracy on four systolic murmur classes from the CirCor dataset using shared-dictionary multiresolution Gabor OMP plus a multi-resolution ViT, but supplies no experimental protocol details.

read the letter

The central result is a 95.96% classification accuracy on four types of systolic murmurs drawn from the CirCor DigiScope recordings. The pipeline extracts sparse time-frequency features via complex orthogonal matching pursuit on a multiresolution Gabor dictionary, reshapes the coefficients into variable-resolution matrices, and feeds them to a vision transformer that tokenizes each matrix with a CNN before concatenating embeddings for a single encoder layer with multi-head attention.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an automatic classification system for four types of systolic murmurs from heart sounds. It employs complex orthogonal matching pursuit to project single or multiple murmur segments onto a redundant multiresolution complex Gabor dictionary, yielding sparse coefficients that are split and reshaped into variable-resolution time-frequency feature matrices. A shared dictionary is enforced across segments of the same recording to promote consistency in the extracted features. These matrices are fed to a vision transformer that tokenizes each resolution level via a CNN, concatenates the resulting embeddings, and processes them through an encoder layer incorporating multi-head attention, residual connections, and a 1x1 convolutional network. An experimental analysis on the CirCor DigiScope dataset reports a classification accuracy of 95.96%.

Significance. If the performance claims hold under rigorous validation, the work could contribute to automated cardiac auscultation by showing how multiresolution sparse time-frequency representations can be integrated with a transformer to manage intra-recording variability. The shared-dictionary constraint in the OMP stage is a reasonable design choice for enforcing consistency, and the multi-resolution ViT handling is a plausible way to fuse features at different scales. However, the absence of standard experimental controls substantially reduces the immediate significance of the reported accuracy.

major comments (2)

[Abstract] Abstract: The central claim of 95.96% classification accuracy on four systolic-murmur classes is stated without any description of the train-test split, cross-validation strategy, baseline comparisons, statistical significance testing, or class-imbalance handling on the CirCor DigiScope dataset. These details are load-bearing for evaluating whether the empirical result supports the effectiveness of the multiresolution Gabor + ViT pipeline.
[Experimental Analysis] Experimental Analysis section: No information is given on whether recordings were partitioned at the patient level or whether the shared dictionary was learned in a way that could introduce leakage across segments of the same patient. This directly affects the weakest assumption that the features remain discriminative for new patients, devices, or noise conditions.

minor comments (2)

[Feature Extraction Module] The description of how the projection weights are split and reshaped into variable-resolution time-frequency matrices would benefit from an explicit equation or pseudocode to clarify the matrix dimensions and resolution levels.
[Classification Model] The vision transformer encoder is described at a high level; a diagram showing the CNN tokenization per resolution, concatenation, and single encoder layer would improve clarity of the multi-resolution fusion.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a standard empirical ML pipeline: complex OMP projects heart-sound segments onto a fixed multiresolution complex Gabor dictionary to produce sparse coefficients that are reshaped into time-frequency matrices; a shared dictionary is used across segments of one recording; these matrices are tokenized by CNNs and fed to a single-layer vision transformer. The 95.96% accuracy is obtained by training and testing this architecture on the CirCor DigiScope dataset. No equation reduces the reported accuracy to a fitted constant, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in via prior work. The central claim is therefore an experimental result rather than an algebraic identity or self-referential definition.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method rests on standard assumptions of sparse representability of heart sounds by Gabor atoms and on conventional deep-learning training procedures; no new physical entities or ungrounded axioms are introduced.

free parameters (2)

Number and choice of resolution levels in the Gabor dictionary
Determines the variable-resolution feature matrices and is selected to balance detail and consistency.
Vision transformer hyperparameters (patch size, number of heads, layers)
Chosen or tuned for the concatenated multi-resolution token matrix.

axioms (1)

domain assumption Heart-sound segments admit a sparse representation in the multiresolution complex Gabor dictionary.
Invoked by the use of complex orthogonal matching pursuit to obtain projection weights.

pith-pipeline@v0.9.0 · 5573 in / 1376 out tokens · 44602 ms · 2026-05-10T08:59:17.197872+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

World Health Organization, Cardiovascular Diseases CVDs, 2021. A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

work page 2021
[2]

H. Jung, L. S. Lilly, The cardiac cycle: Mechanisms of heart sounds and murmurs, in Electronic Research Archive Volume 34, Issue 3, xxx–xxx 25 Pathophysiology of Heart Disease: A Collaborative Project of Medical Students and Faculty, 5th edition, Chapter 2, Philadelphia, (2011), 28–53

work page 2011
[3]

Boashash, time–frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edition, Academic Press, Oxford, 2015

B. Boashash, time–frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edition, Academic Press, Oxford, 2015

work page 2015
[4]

Mallat and Z

S. Mallat, Z. Zhang, Matching pursuit with time–frequency dictionaries, IEEE Trans. Signal Process., 41 (1993), 3397–3415. https://doi.org/10.1109/78.258082

work page doi:10.1109/78.258082 1993
[5]

Zhang, L

X. Zhang, L. G. Durand, L. Senhadji, H. C. Lee, J. L. Coatrieux, Analysis-synthesis of the phonocardiogram based on the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 962–971. https://doi.org/10.1109/10.704865

work page doi:10.1109/10.704865 1998
[6]

Zhang, L

X. Zhang, L. Durand, L. Senhadji, H. Lee, J. L. Coatrieux, time–frequency scaling transfor- mation of the phonocardiogram based of the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 972–979. https://doi.org/10.1109/10.704866

work page doi:10.1109/10.704866 1998
[7]

Goodfellow, Y

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2017

work page 2017
[8]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, (2017), 5998–6008

work page 2017
[9]

Y. Wang, Y. Deng, Y. Zheng, P. Chattopadhyay, L. Wang, Vision transform- ers for image classification: A comparative survey, Technologies, 13 (2025), 32. https://doi.org/10.3390/technologies13010032

work page doi:10.3390/technologies13010032 2025
[10]

Oliveira, F

J. Oliveira, F. Renna, P. D. Costa, M. Nogueira, C. Oliveira, C. Ferreira, et al., The circor digiscope dataset: From murmur detection to murmur classification, IEEE J. Biomed. Health. Inf., 26 (2022), 2524–2535. https://doi.org/10.1109/JBHI.2021.3137048

work page doi:10.1109/jbhi.2021.3137048 2022
[11]

R. O. Bonow, D. L. Mann, D. P. Zipes, P. Libby, Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine, 9th edition, Elsevier Health Sciences, Philadelphia, 2011

work page 2011
[12]

Singh, R

J. Singh, R. Anand, Computer aided analysis of phonocardiogram, J. Med. Eng. Technol., 31 (2007), 319–323. https://doi.org/10.1080/03091900500282772

work page doi:10.1080/03091900500282772 2007
[13]

Zheng, X

Y. Zheng, X. Guo, X. Ding, A novel hybrid energy fraction and entropy-based ap- proach for systolic heart murmurs identification, Expert Syst. Appl., 42 (2015), 2710–2721. https://doi.org/10.1016/j.eswa.2014.10.051

work page doi:10.1016/j.eswa.2014.10.051 2015
[14]

P. D. Stein, H. N. Sabbah, J. B. Lakier, S. R. Kemp, D. J. Magilligan, Frequency content of heart sounds and systolic murmurs in patients with porcine bioprosthetic valves: Diagnostic value for the early detection of valvular degeneration, Henry Ford Hosp. Med. J., 30 (1982), 119–123

work page 1982
[15]

Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998

M. Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998

work page 1998
[16]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Variational mode decomposition and a light cnn- lstm model for classification of heart sound signals, in IEEE EUROCON 2023 - 20th International Conference on Smart Technologies, Torino, Italy, (2023), 295–300. https://doi.org/10.1109/EUROCON56442.2023.10199054 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 26

work page doi:10.1109/eurocon56442.2023.10199054 2023
[17]

Atanasov, T

N. Atanasov, T. Ning, Isolation of systolic heart murmurs using wavelet transform and energy index, in 2008 Congress on Image and Signal Processing, Sanya, China, (2008), 216–

work page 2008
[18]

https://doi.org/10.1109/CISP.2008.758

work page doi:10.1109/cisp.2008.758 2008
[19]

Haghighi-Mood, N

A. Haghighi-Mood, N. Torry, time–frequency analysis of systolic murmurs, in Computers in Cardiology 1997, IEEE, (1997), 113–116. https://doi.org/10.1109/CIC.1997.647843

work page doi:10.1109/cic.1997.647843 1997
[20]

J. G. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orien- tation optimized by two-dimensional visual cortical filters, J. Opt. Soc. Am. A, 2 (1985), 1160–1169. https://doi.org/10.1364/JOSAA.2.001160

work page doi:10.1364/josaa.2.001160 1985
[21]

Fakhry, A

M. Fakhry, A. F. Brery, A. Gallardo-Antolín, Analysis of heart sound signals using sparse modeling with gabor dictionary, in 2022 IEEE International Symposium on Multimedia (ISM), Italy, (2022), 92–96. https://doi.org/10.1109/ISM55400.2022.00021

work page doi:10.1109/ism55400.2022.00021 2022
[22]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Elastic net regularization and gabor dictionary for clas- sification of heart sound signals using deep learning, Eng. Appl. Artif. Intell., 127 (2024), 107406. https://doi.org/10.1016/j.engappai.2023.107406

work page doi:10.1016/j.engappai.2023.107406 2024
[23]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Analysis of systolic murmurs in heart sounds using multiresolution complex gabor dictionary, in 2024 International Conference on Computer and Applications (ICCA), IEEE, Cairo, Egypt, (2024), 783–787. https://doi.org/10.1109/ICCA62237.2024.10927981

work page doi:10.1109/icca62237.2024.10927981 2024
[24]

Jabbari, H

S. Jabbari, H. Ghassemian, Modeling of heart systolic murmurs based on multivariate match- ing pursuit for diagnosis of valvular disorders, Comput. Biol. Med., 41 (2011), 802–811. https://doi.org/10.1016/j.compbiomed.2011.06.016

work page doi:10.1016/j.compbiomed.2011.06.016 2011
[25]

Shabbir, X

M. Shabbir, X. Liu, M. Nasseri, S. Helgeson, Heart murmur classification in phonocardio- gram representations using convolutional neural networks, in The International FLAIRS Conference Proceedings, 36 (2023). https://doi.org/10.32473/flairs.36.133189

work page doi:10.32473/flairs.36.133189 2023
[26]

C. Yin, Y. Zheng, X. Ding, Y. Shi, J. Qin, X. Guo, Detection of coronary artery disease based on clinical phonocardiogram and multiscale attention convolu- tional compression network, IEEE J. Biomed. Health. Inf., 28 (2024), 1353–1362. https://doi.org/10.1109/JBHI.2024.3354832

work page doi:10.1109/jbhi.2024.3354832 2024
[27]

J. Kim, G. Park, B. Suh, Classification of phonocardiogram recordings using vision trans- former architecture, in 2022 Computing in Cardiology (CinC), IEEE, Tampere, Finland, 498 (2022), 1-4. https://doi.org/10.22489/CinC.2022.084

work page doi:10.22489/cinc.2022.084 2022
[28]

Z. Liu, H. Jiang, F. W. Zhang, W. B. Ouyang, X. Li, X. Pan, Heart sound classification based on bispectrum features and vision transformer mode, Alexandria Eng. J., 85 (2023), 49–59. https://doi.org/10.1016/j.aej.2023.11.035

work page doi:10.1016/j.aej.2023.11.035 2023
[29]

J. Han, A. Shaout, Enact-heart – ensemble-based assessment using cnn and transformer on heart sounds, preprint, arXiv:2502.16914. https://doi.org/10.48550/arXiv.2502.16914

work page doi:10.48550/arxiv.2502.16914
[30]

W. Zhao, H. Ma, N. Jin, Y. Zheng, X. Guo, Detection of coronary heart disease based on heart sound and hybrid vision transformer, Appl. Acoust., 230 (2025), 110420. https://doi.org/10.1016/j.apacoust.2024.110420 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 27

work page doi:10.1016/j.apacoust.2024.110420 2025
[31]

R. Wang, Y. Duan, Y. Li, D. Zheng, X. Liu, C. T. Lam, et al., Pctmf-net: heart sound clas- sification with parallel CNNs-transformer and second-order spectral analysis, Vis. Comput., 39 (2023), 3811–3822. https://doi.org/10.1007/s00371-023-03031-5

work page doi:10.1007/s00371-023-03031-5 2023
[32]

S. Qiu, H. G. Feichtinger, Discrete gabor structures and optimal representations, IEEE Trans. Signal Process., 43 (1995), 2258–2268. https://doi.org/10.1109/78.469862

work page doi:10.1109/78.469862 1995
[33]

I. Rish, G. Grabarnik, Sparse Modeling: Theory, Algorithms, and Applications, CRC Press, 2014

work page 2014
[34]

Průša, N

Z. Průša, N. Holighaus, P. Balázs, Fast matching pursuit with multi-Gabor dictionaries, ACM Trans. Math. Software, 47 (2021), 1–20. https://doi.org/10.1145/3447958

work page doi:10.1145/3447958 2021
[35]

Zhang, S

Z. Zhang, S. Wei, D. Wei, L. Li, F. Liu, C. Liu, Comparison of four recovery algorithms used in compressed sensing for ecg signal processing, in 2016 Computing in Cardiology Conference (CinC), IEEE, (2016), 401–404

work page 2016
[36]

W. Dai, O. Milenkovic, Subspace pursuit for compressive sensing signal reconstruction, IEEE Trans. Inf. Theory, 55 (2008), 2230–2249. https://doi.org/10.1109/TIT.2009.2016006

work page doi:10.1109/tit.2009.2016006 2008
[37]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770– 778. © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons....

work page 2016

[1] [1]

A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

World Health Organization, Cardiovascular Diseases CVDs, 2021. A vailable from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

work page 2021

[2] [2]

H. Jung, L. S. Lilly, The cardiac cycle: Mechanisms of heart sounds and murmurs, in Electronic Research Archive Volume 34, Issue 3, xxx–xxx 25 Pathophysiology of Heart Disease: A Collaborative Project of Medical Students and Faculty, 5th edition, Chapter 2, Philadelphia, (2011), 28–53

work page 2011

[3] [3]

Boashash, time–frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edition, Academic Press, Oxford, 2015

B. Boashash, time–frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edition, Academic Press, Oxford, 2015

work page 2015

[4] [4]

Mallat and Z

S. Mallat, Z. Zhang, Matching pursuit with time–frequency dictionaries, IEEE Trans. Signal Process., 41 (1993), 3397–3415. https://doi.org/10.1109/78.258082

work page doi:10.1109/78.258082 1993

[5] [5]

Zhang, L

X. Zhang, L. G. Durand, L. Senhadji, H. C. Lee, J. L. Coatrieux, Analysis-synthesis of the phonocardiogram based on the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 962–971. https://doi.org/10.1109/10.704865

work page doi:10.1109/10.704865 1998

[6] [6]

Zhang, L

X. Zhang, L. Durand, L. Senhadji, H. Lee, J. L. Coatrieux, time–frequency scaling transfor- mation of the phonocardiogram based of the matching pursuit method, IEEE Trans. Biomed. Eng., 45 (1998), 972–979. https://doi.org/10.1109/10.704866

work page doi:10.1109/10.704866 1998

[7] [7]

Goodfellow, Y

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2017

work page 2017

[8] [8]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, (2017), 5998–6008

work page 2017

[9] [9]

Y. Wang, Y. Deng, Y. Zheng, P. Chattopadhyay, L. Wang, Vision transform- ers for image classification: A comparative survey, Technologies, 13 (2025), 32. https://doi.org/10.3390/technologies13010032

work page doi:10.3390/technologies13010032 2025

[10] [10]

Oliveira, F

J. Oliveira, F. Renna, P. D. Costa, M. Nogueira, C. Oliveira, C. Ferreira, et al., The circor digiscope dataset: From murmur detection to murmur classification, IEEE J. Biomed. Health. Inf., 26 (2022), 2524–2535. https://doi.org/10.1109/JBHI.2021.3137048

work page doi:10.1109/jbhi.2021.3137048 2022

[11] [11]

R. O. Bonow, D. L. Mann, D. P. Zipes, P. Libby, Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine, 9th edition, Elsevier Health Sciences, Philadelphia, 2011

work page 2011

[12] [12]

Singh, R

J. Singh, R. Anand, Computer aided analysis of phonocardiogram, J. Med. Eng. Technol., 31 (2007), 319–323. https://doi.org/10.1080/03091900500282772

work page doi:10.1080/03091900500282772 2007

[13] [13]

Zheng, X

Y. Zheng, X. Guo, X. Ding, A novel hybrid energy fraction and entropy-based ap- proach for systolic heart murmurs identification, Expert Syst. Appl., 42 (2015), 2710–2721. https://doi.org/10.1016/j.eswa.2014.10.051

work page doi:10.1016/j.eswa.2014.10.051 2015

[14] [14]

P. D. Stein, H. N. Sabbah, J. B. Lakier, S. R. Kemp, D. J. Magilligan, Frequency content of heart sounds and systolic murmurs in patients with porcine bioprosthetic valves: Diagnostic value for the early detection of valvular degeneration, Henry Ford Hosp. Med. J., 30 (1982), 119–123

work page 1982

[15] [15]

Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998

M. Akay, time–frequency and Wavelets in Biomedical Signal Processing, Wiley-IEEE Press, New York, 1998

work page 1998

[16] [16]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Variational mode decomposition and a light cnn- lstm model for classification of heart sound signals, in IEEE EUROCON 2023 - 20th International Conference on Smart Technologies, Torino, Italy, (2023), 295–300. https://doi.org/10.1109/EUROCON56442.2023.10199054 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 26

work page doi:10.1109/eurocon56442.2023.10199054 2023

[17] [17]

Atanasov, T

N. Atanasov, T. Ning, Isolation of systolic heart murmurs using wavelet transform and energy index, in 2008 Congress on Image and Signal Processing, Sanya, China, (2008), 216–

work page 2008

[18] [18]

https://doi.org/10.1109/CISP.2008.758

work page doi:10.1109/cisp.2008.758 2008

[19] [19]

Haghighi-Mood, N

A. Haghighi-Mood, N. Torry, time–frequency analysis of systolic murmurs, in Computers in Cardiology 1997, IEEE, (1997), 113–116. https://doi.org/10.1109/CIC.1997.647843

work page doi:10.1109/cic.1997.647843 1997

[20] [20]

J. G. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orien- tation optimized by two-dimensional visual cortical filters, J. Opt. Soc. Am. A, 2 (1985), 1160–1169. https://doi.org/10.1364/JOSAA.2.001160

work page doi:10.1364/josaa.2.001160 1985

[21] [21]

Fakhry, A

M. Fakhry, A. F. Brery, A. Gallardo-Antolín, Analysis of heart sound signals using sparse modeling with gabor dictionary, in 2022 IEEE International Symposium on Multimedia (ISM), Italy, (2022), 92–96. https://doi.org/10.1109/ISM55400.2022.00021

work page doi:10.1109/ism55400.2022.00021 2022

[22] [22]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Elastic net regularization and gabor dictionary for clas- sification of heart sound signals using deep learning, Eng. Appl. Artif. Intell., 127 (2024), 107406. https://doi.org/10.1016/j.engappai.2023.107406

work page doi:10.1016/j.engappai.2023.107406 2024

[23] [23]

Fakhry, A

M. Fakhry, A. Gallardo-Antolín, Analysis of systolic murmurs in heart sounds using multiresolution complex gabor dictionary, in 2024 International Conference on Computer and Applications (ICCA), IEEE, Cairo, Egypt, (2024), 783–787. https://doi.org/10.1109/ICCA62237.2024.10927981

work page doi:10.1109/icca62237.2024.10927981 2024

[24] [24]

Jabbari, H

S. Jabbari, H. Ghassemian, Modeling of heart systolic murmurs based on multivariate match- ing pursuit for diagnosis of valvular disorders, Comput. Biol. Med., 41 (2011), 802–811. https://doi.org/10.1016/j.compbiomed.2011.06.016

work page doi:10.1016/j.compbiomed.2011.06.016 2011

[25] [25]

Shabbir, X

M. Shabbir, X. Liu, M. Nasseri, S. Helgeson, Heart murmur classification in phonocardio- gram representations using convolutional neural networks, in The International FLAIRS Conference Proceedings, 36 (2023). https://doi.org/10.32473/flairs.36.133189

work page doi:10.32473/flairs.36.133189 2023

[26] [26]

C. Yin, Y. Zheng, X. Ding, Y. Shi, J. Qin, X. Guo, Detection of coronary artery disease based on clinical phonocardiogram and multiscale attention convolu- tional compression network, IEEE J. Biomed. Health. Inf., 28 (2024), 1353–1362. https://doi.org/10.1109/JBHI.2024.3354832

work page doi:10.1109/jbhi.2024.3354832 2024

[27] [27]

J. Kim, G. Park, B. Suh, Classification of phonocardiogram recordings using vision trans- former architecture, in 2022 Computing in Cardiology (CinC), IEEE, Tampere, Finland, 498 (2022), 1-4. https://doi.org/10.22489/CinC.2022.084

work page doi:10.22489/cinc.2022.084 2022

[28] [28]

Z. Liu, H. Jiang, F. W. Zhang, W. B. Ouyang, X. Li, X. Pan, Heart sound classification based on bispectrum features and vision transformer mode, Alexandria Eng. J., 85 (2023), 49–59. https://doi.org/10.1016/j.aej.2023.11.035

work page doi:10.1016/j.aej.2023.11.035 2023

[29] [29]

J. Han, A. Shaout, Enact-heart – ensemble-based assessment using cnn and transformer on heart sounds, preprint, arXiv:2502.16914. https://doi.org/10.48550/arXiv.2502.16914

work page doi:10.48550/arxiv.2502.16914

[30] [30]

W. Zhao, H. Ma, N. Jin, Y. Zheng, X. Guo, Detection of coronary heart disease based on heart sound and hybrid vision transformer, Appl. Acoust., 230 (2025), 110420. https://doi.org/10.1016/j.apacoust.2024.110420 Electronic Research Archive Volume 34, Issue 3, xxx–xxx 27

work page doi:10.1016/j.apacoust.2024.110420 2025

[31] [31]

R. Wang, Y. Duan, Y. Li, D. Zheng, X. Liu, C. T. Lam, et al., Pctmf-net: heart sound clas- sification with parallel CNNs-transformer and second-order spectral analysis, Vis. Comput., 39 (2023), 3811–3822. https://doi.org/10.1007/s00371-023-03031-5

work page doi:10.1007/s00371-023-03031-5 2023

[32] [32]

S. Qiu, H. G. Feichtinger, Discrete gabor structures and optimal representations, IEEE Trans. Signal Process., 43 (1995), 2258–2268. https://doi.org/10.1109/78.469862

work page doi:10.1109/78.469862 1995

[33] [33]

I. Rish, G. Grabarnik, Sparse Modeling: Theory, Algorithms, and Applications, CRC Press, 2014

work page 2014

[34] [34]

Průša, N

Z. Průša, N. Holighaus, P. Balázs, Fast matching pursuit with multi-Gabor dictionaries, ACM Trans. Math. Software, 47 (2021), 1–20. https://doi.org/10.1145/3447958

work page doi:10.1145/3447958 2021

[35] [35]

Zhang, S

Z. Zhang, S. Wei, D. Wei, L. Li, F. Liu, C. Liu, Comparison of four recovery algorithms used in compressed sensing for ecg signal processing, in 2016 Computing in Cardiology Conference (CinC), IEEE, (2016), 401–404

work page 2016

[36] [36]

W. Dai, O. Milenkovic, Subspace pursuit for compressive sensing signal reconstruction, IEEE Trans. Inf. Theory, 55 (2008), 2230–2249. https://doi.org/10.1109/TIT.2009.2016006

work page doi:10.1109/tit.2009.2016006 2008

[37] [37]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770– 778. © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons....

work page 2016