Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals

Kayapanda Mandana; Matthew Fynn; Milan Marocchi; Yue Rong

arxiv: 2509.11606 · v4 · pith:HM73O6WNnew · submitted 2025-09-15 · 💻 cs.SD · cs.LG· eess.SP

Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals

Milan Marocchi , Matthew Fynn , Kayapanda Mandana , Yue Rong This is my paper

classification 💻 cs.SD cs.LGeess.SP

keywords multichannelaugmenteddatasetdatasetsheartaccuracymodelsmultimodal

0 comments

read the original abstract

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for approximately 17.9 million deaths each year. Early detection is critical, creating a demand for accurate and inexpensive pre-screening methods. Deep learning has recently been applied to classify abnormal heart sounds indicative of CVDs using synchronised phonocardiogram (PCG) and electrocardiogram (ECG) signals, as well as multichannel PCG (mPCG). However, state-of-the-art architectures remain underutilised due to the limited availability of synchronised and multichannel datasets. Augmented datasets and pre-trained models provide a pathway to overcome these limitations, enabling transformer-based architectures to be trained effectively. This work combines traditional signal processing with denoising diffusion models, WaveGrad and DiffWave, to create an augmented dataset to fine-tune a Wav2Vec 2.0-based classifier on multimodal and multichannel heart sound datasets. The approach achieves state-of-the-art performance. On the Computing in Cardiology (CinC) 2016 dataset of single channel PCG, accuracy, unweighted average recall (UAR), sensitivity, specificity and Matthew's correlation coefficient (MCC) reach 92.48%, 93.05%, 93.63%, 92.48%, 94.93% and 0.8283, respectively. Using the synchronised PCG and ECG signals of the training-a dataset from CinC, 93.14%, 92.21%, 94.35%, 90.10%, 95.12% and 0.8380 are achieved for accuracy, UAR, sensitivity, specificity and MCC, respectively. Using a wearable vest dataset consisting of mPCG data, the model achieves 77.13% accuracy, 74.25% UAR, 86.47% sensitivity, 62.04% specificity, and 0.5082 MCC. These results demonstrate the effectiveness of transformer-based models for CVD detection when supported by augmented datasets, highlighting their potential to advance multimodal and multichannel heart sound classification.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Noise-Robust Contrastive Learning with an MFCC-Conformer For Coronary Artery Disease Detection
eess.AS 2026-01 unverdicted novelty 4.0

A multichannel energy-based noisy-segment rejection step combined with an MFCC-Conformer classifier improves CAD detection accuracy to 78.4% on 297 subjects, a 4.1% gain over baseline training without rejection.