Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Sami O. Simons; Visara Urovi; Wafaa Aljbawi; Yuyang Yan

arxiv: 2402.07619 · v2 · submitted 2024-02-12 · 💻 cs.SD · cs.AI· eess.AS

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Yuyang Yan , Wafaa Aljbawi , Sami O. Simons , Visara Urovi This is my paper

Pith reviewed 2026-05-24 03:51 UTC · model grok-4.3

classification 💻 cs.SD cs.AIeess.AS

keywords COVID-19 detectionvoice recordingsdeep learningHuBERTcrowd-sourced dataMel-spectrogramsspeech analysis

0 comments

The pith

HuBERT identifies COVID-19 from voice recordings at 86% accuracy and 0.93 AUC using crowd-sourced data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper builds deep learning models that classify COVID-19 status using only voice recordings collected through a mobile app. It processes 893 samples with features such as Mel-spectrograms and MFCCs, then compares LSTM, CNN, and HuBERT architectures against baseline methods. The HuBERT model reaches 86 percent accuracy and 0.93 AUC, the highest among those tested. These outcomes point to voice as a viable signal for scalable COVID-19 identification in the post-pandemic period. A sympathetic reader would see this as a step toward low-cost, non-contact diagnostic tools.

Core claim

The authors develop deep learning models to identify COVID-19 from voice recording data using the Cambridge COVID-19 Sound database of 893 speech samples. Voice features including Mel-spectrograms, MFCC, and CNN Encoder features are extracted and used to train LSTM, CNN, and HuBERT models. HuBERT achieves the highest accuracy of 86% and AUC of 0.93, outperforming other models and suggesting promising results for COVID-19 diagnosis from voice recordings.

What carries the argument

The HuBERT model applied to Mel-spectrograms and MFCC voice features extracted from crowd-sourced recordings for COVID-19 classification.

If this is right

Voice-based deep learning models can achieve over 85% accuracy in COVID-19 detection.
HuBERT outperforms LSTM and CNN for this classification task.
Crowd-sourced voice data supports training of effective prediction models.
The method provides a non-invasive and scalable approach to COVID-19 identification.
Results compare favorably to state-of-the-art methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such models could be deployed in mobile apps for at-home screening to reduce reliance on clinical tests.
The technique may extend to other respiratory illnesses that alter voice characteristics.
Independent clinical validation is needed to confirm performance beyond app-reported labels.
Combining voice data with additional inputs could further improve multi-variate prediction accuracy.

Load-bearing premise

The labels reported by app users accurately indicate true COVID-19 infection without clinical test verification.

What would settle it

Testing the model on voice recordings from participants with independently confirmed PCR-positive or negative COVID-19 status to check if accuracy stays near 86%.

Figures

Figures reproduced from arXiv: 2402.07619 by Sami O. Simons, Visara Urovi, Wafaa Aljbawi, Yuyang Yan.

**Figure 1.** Figure 1: The used pipeline for both traditional Machine learning classifiers and Deep Learning classifiers for COVID-19 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Users characteristics (a) age, (b) gender, (c) COVID-19 test results, (d) the number of admissions to hospital [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: ROC curve for Models We test the performance of the LSTM model, we take the same strategy as the MFCC features extracted from the audio recordings. According to [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: ROC curve for Coswara dataset validation [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: ROC curve for distinguishing COVID-19 from cold symptoms [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

COVID-19 has affected more than 223 countries worldwide and in the Post-COVID Era, there is a pressing need for non-invasive, low-cost, and highly scalable solutions to detect COVID-19. We develop a deep learning model to identify COVID-19 from voice recording data. The novelty of this work is in the development of deep learning models for COVID-19 identification from only voice recordings. We use the Cambridge COVID-19 Sound database which contains 893 speech samples, crowd-sourced from 4352 participants via a COVID-19 Sounds app. Voice features including Mel-spectrograms and Mel-frequency cepstral coefficients (MFCC) and CNN Encoder features are extracted. Based on the voice data, we develop deep learning classification models to detect COVID-19 cases. These models include Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) and Hidden-Unit BERT (HuBERT). We compare their predictive power to baseline machine learning models. HuBERT achieves the highest accuracy of 86\% and the highest AUC of 0.93. The results achieved with the proposed models suggest promising results in COVID-19 diagnosis from voice recordings when compared to the results obtained from the state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HuBERT hits 86% acc and 0.93 AUC on the Cambridge voice set, but the result rests on unverified crowd-sourced labels.

read the letter

The one or two things to know: this is an application of LSTM, CNN, and HuBERT to an existing crowd-sourced voice dataset for COVID-19 detection, with HuBERT at 86% accuracy and 0.93 AUC. The labels come from an app without clinical verification. The paper extracts standard features like Mel-spectrograms and MFCCs, adds CNN encoder features, and trains the models. It compares them to baseline machine learning models. That comparison is the main contribution, and it is executed in a direct way. It does well at documenting the performance numbers on the 893 samples from 4352 participants. The claim that these are promising results compared to state-of-the-art is reasonable within the scope of the work. The main concern is the ground truth. The stress-test note is on point here. Without documented checks against clinical tests or controls for who submits recordings, the high AUC could be inflated by label noise or bias. The abstract supplies no information on splits or statistical testing either, which makes it hard to trust the exact figures. This paper is for specialists in voice-based health screening who follow this dataset. A general reader or someone wanting new methods will get little from it. I would not bring this to reading group. It is not something I would cite. The central assumption about the labels needs addressing, so it does not deserve a serious referee at this stage.

Referee Report

2 major / 1 minor

Summary. The manuscript develops deep learning classifiers (LSTM, CNN, HuBERT) on features including Mel-spectrograms, MFCCs, and CNN encoder outputs extracted from 893 crowd-sourced voice samples in the Cambridge COVID-19 Sound database. It reports that HuBERT attains the highest performance at 86% accuracy and 0.93 AUC for binary COVID-19 classification, outperforming the other models and baselines, and positions the results as promising for non-invasive detection.

Significance. If the reported metrics hold under rigorous evaluation, the work would demonstrate the applicability of self-supervised audio models like HuBERT to respiratory voice data for scalable screening. The scale of the crowd-sourced corpus and direct comparison among LSTM/CNN/HuBERT architectures provide a useful empirical baseline for this task.

major comments (2)

[Methods] Methods section: the abstract and text supply no details on train-test split ratios, cross-validation procedure, class-balance handling, or statistical testing, so the central claim that HuBERT reaches 86% accuracy and 0.93 AUC cannot be verified or reproduced from the given information.
[Data Description] Data section: the 893 samples rely on app-reported binary labels treated as ground truth, with no description of PCR/antigen confirmation, symptom cross-validation, or controls for selection bias and label noise; this assumption is load-bearing for the reported AUC and accuracy figures.

minor comments (1)

[Abstract] Abstract: the statement that results are compared to 'state-of-the-art' lacks explicit citations or tabulated baseline numbers, making the comparative claim difficult to assess.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important omissions in the methods and data sections that affect reproducibility and interpretation. We address each point below and commit to revisions that strengthen the manuscript without altering the core claims or results.

read point-by-point responses

Referee: [Methods] Methods section: the abstract and text supply no details on train-test split ratios, cross-validation procedure, class-balance handling, or statistical testing, so the central claim that HuBERT reaches 86% accuracy and 0.93 AUC cannot be verified or reproduced from the given information.

Authors: We agree that the original manuscript omitted these critical experimental details. In the revised version we will add a dedicated subsection describing the protocol: an 80/20 train-test split stratified by participant to avoid leakage, 5-fold cross-validation on the training set, class-imbalance handling via weighted cross-entropy loss (weights inversely proportional to class frequencies), and statistical testing (McNemar’s test for accuracy differences and DeLong’s test for AUC comparisons). These additions will allow independent verification of the reported HuBERT performance. revision: yes
Referee: [Data Description] Data section: the 893 samples rely on app-reported binary labels treated as ground truth, with no description of PCR/antigen confirmation, symptom cross-validation, or controls for selection bias and label noise; this assumption is load-bearing for the reported AUC and accuracy figures.

Authors: The Cambridge COVID-19 Sound database supplies only self-reported labels collected through the mobile app; PCR or antigen confirmation is not available for the majority of samples. We will expand the data section to state this explicitly, cite the original database paper for the collection protocol, and add a limitations paragraph discussing label noise, selection bias, and the absence of clinical confirmation. No additional ground-truth information exists in the released dataset, so we cannot retroactively provide PCR validation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML accuracies on fixed dataset

full rationale

The paper trains standard classifiers (LSTM, CNN, HuBERT) on Mel-spectrograms/MFCC features from the Cambridge COVID-19 Sound database and reports held-out accuracy/AUC. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes that reduce the reported metrics to the inputs by construction. The evaluation follows conventional supervised learning practice on an external crowd-sourced corpus; the central numbers are not tautological with the training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that app-provided COVID labels are accurate and that the collected voice samples are free of systematic selection or labeling bias; no free parameters, axioms, or invented entities are declared.

axioms (1)

domain assumption Crowd-sourced labels from the COVID-19 Sounds app constitute reliable ground truth
Invoked when training and evaluating all models on the 893 samples

pith-pipeline@v0.9.0 · 5767 in / 1178 out tokens · 24425 ms · 2026-05-24T03:51:23.258390+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimising MFCC parameters for the automatic detection of respiratory diseases
cs.SD 2024-08 conditional novelty 3.0

Empirical tuning of MFCC parameters (roughly 30 coefficients, shorter hops, dataset-dependent frame lengths) improves SVM accuracy for respiratory disease detection by 14.9-19.6% on COVID-19 and voice-disorder datasets.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Covid-19 coronavirus outbreak,

Worldometer, “Covid-19 coronavirus outbreak,” 2023

work page 2023
[2]

Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (covid-19): The epidemic and the challenges,

C.-C. Lai, T.-P . Shih, W.-C. Ko, H.-J. Tang, and P .-R. Hsueh, “Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (covid-19): The epidemic and the challenges,” International journal of antimicrobial agents, vol. 55, no. 3, p. 105924, 2020

work page 2019
[3]

Science brief: Sars-cov-2 and surface (fomite) transmission for indoor community environments,

N. C. for Immunization et al. , “Science brief: Sars-cov-2 and surface (fomite) transmission for indoor community environments,” in CDC COVID-19 Science Briefs [Internet]. Centers for Disease Control and Prevention (US), 2021

work page 2021
[4]

Covid 19 can spread through breathing, talking, study estimates,

R. Ningthoujam, “Covid 19 can spread through breathing, talking, study estimates,” Current medicine research and practice , vol. 10, no. 3, p. 132, 2020

work page 2020
[5]

Sounds of covid-19: exploring realistic performance of audio-based digital testing,

J. Han, T. Xia, D. Spathis, E. Bondareva, C. Brown, J. Chauhan, T. Dang, A. Grammenos, A. Hasthanasombat, A. Floto et al. , “Sounds of covid-19: exploring realistic performance of audio-based digital testing,” NPJ digital medicine, vol. 5, no. 1, pp. 1–9, 2022

work page 2022
[6]

Automatic detection of covid-19 based on short-duration acoustic smartphone speech analysis,

B. Stasak, Z. Huang, S. Razavi, D. Joachim, and J. Epps, “Automatic detection of covid-19 based on short-duration acoustic smartphone speech analysis,” Journal of Healthcare Informatics Research, vol. 5, no. 2, pp. 201–217, 2021

work page 2021
[7]

Covid-19 detection system using recurrent neural networks,

A. Hassan, I. Shahin, and M. B. Alsabek, “Covid-19 detection system using recurrent neural networks,” in 2020 International conference on communications, computing, cybersecurity, and informatics (CCCI). IEEE, 2020, pp. 1–5

work page 2020
[8]

Detection of covid-19 using heart rate and blood pressure: Lessons learned from patients with ards,

M. A. Mehrabadi, S. A. H. Aqajari, I. Azimi, C. A. Downs, N. Dutt, and A. M. Rahmani, “Detection of covid-19 using heart rate and blood pressure: Lessons learned from patients with ards,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2021, pp. 2140–2143. SEPTEMBER 2023 14

work page 2021
[9]

Vibration feature extraction using audio spectrum analyzer based machine learning,

J.-S. Liang and K. Wang, “Vibration feature extraction using audio spectrum analyzer based machine learning,” in 2017 International conference on information, Communication and Engineering (ICICE) . IEEE, 2017, pp. 381–384

work page 2017
[10]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P . Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” arXiv preprint arXiv:2006.05919, 2020

work page arXiv 2006
[11]

Do you have covid-19? an artificial intelligence-based screening tool for covid-19 using acoustic parameters,

A. Vahedian-Azimi, A. Keramatfar, M. Asiaee, S. S. Atashi, and M. Nourbakhsh, “Do you have covid-19? an artificial intelligence-based screening tool for covid-19 using acoustic parameters,” The Journal of the Acoustical Society of America , vol. 150, no. 3, pp. 1945–1953, 2021

work page 1945
[12]

Detection of covid-19 from voice, cough and breathing patterns: Dataset and preliminary results,

V . Despotovic, M. Ismael, M. Cornil, R. Mc Call, and G. Fagherazzi, “Detection of covid-19 from voice, cough and breathing patterns: Dataset and preliminary results,” Computers in Biology and Medicine, vol. 138, p. 104944, 2021

work page 2021
[13]

Diagnostic accuracy of rapid antigen tests for covid-19 detection: a systematic review with meta-analysis,

M. Arshadi, F. Fardsanei, B. Deihim, Z. Farshadzadeh, F. Nikkhahi, F. Khalili, G. Sotgiu, A. H. Shahidi Bonjar, R. Centis, G. B. Migliori et al., “Diagnostic accuracy of rapid antigen tests for covid-19 detection: a systematic review with meta-analysis,” Frontiers in medicine , vol. 9, p. 984, 2022

work page 2022
[14]

Covid-19 detection systems using deep-learning algorithms based on speech and image data,

A. B. Nassif, I. Shahin, M. Bader, A. Hassan, and N. Werghi, “Covid-19 detection systems using deep-learning algorithms based on speech and image data,” Mathematics, vol. 10, no. 4, p. 564, 2022

work page 2022
[15]

Covnet: A transfer learning framework for automatic covid-19 detection from crowd-sourced cough sounds,

Y. Chang, X. Jing, Z. Ren, and B. W. Schuller, “Covnet: A transfer learning framework for automatic covid-19 detection from crowd-sourced cough sounds,” Frontiers in Digital Health, vol. 3, 2021

work page 2021
[16]

Pay attention to the speech: Covid-19 diagnosis using machine learning and crowdsourced respiratory and speech recordings,

M. Aly, K. H. Rahouma, and S. M. Ramzy, “Pay attention to the speech: Covid-19 diagnosis using machine learning and crowdsourced respiratory and speech recordings,” Alexandria Engineering Journal, vol. 61, no. 5, pp. 3487–3500, 2022

work page 2022
[17]

The interspeech 2021 computational paralinguistics challenge: Covid-19 cough, covid-19 speech, escalation & primates,

B. W. Schuller, A. Batliner, C. Bergler, C. Mascolo, J. Han, I. Lefter, H. Kaya, S. Amiriparian, A. Baird, L. Stappen et al., “The interspeech 2021 computational paralinguistics challenge: Covid-19 cough, covid-19 speech, escalation & primates,” arXiv preprint arXiv:2102.13468, 2021

work page arXiv 2021
[18]

Voice for health: The use of vocal biomarkers from research to clinical practice,

G. Fagherazzi, A. Fischer, M. Ismael, and V . Despotovic, “Voice for health: The use of vocal biomarkers from research to clinical practice,” Digital biomarkers, vol. 5, no. 1, pp. 78–88, 2021

work page 2021
[19]

Automatic diagnosis of covid-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: cough, voice, and breath,

K. K. Lella and A. Pja, “Automatic diagnosis of covid-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: cough, voice, and breath,” Alexandria Engineering Journal, vol. 61, no. 2, pp. 1319–1334, 2022

work page 2022
[20]

Identify- ing individuals with recent covid-19 through voice classification using deep learning,

P . Suppakitjanusant, S. Sungkanuparph, T. Wongsinin, S. Virapongsiri, N. Kasemkosin, L. Chailurkit, and B. Ongphiphadhanakul, “Identify- ing individuals with recent covid-19 through voice classification using deep learning,” Scientific Reports, vol. 11, no. 1, pp. 1–7, 2021

work page 2021
[21]

Using ai to predict service agent stress from emotion patterns in service interactions,

S. Bromuri, A. P . Henkel, D. Iren, and V . Urovi, “Using ai to predict service agent stress from emotion patterns in service interactions,”Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2021

work page 2021
[22]

An analytical study of speech pathology detection based on mfcc and deep neural networks,

M. Zakariah, Y. Ajmi Alothaibi, Y. Guo, K. Tran-Trung, M. M. Elahi et al., “An analytical study of speech pathology detection based on mfcc and deep neural networks,” Computational and Mathematical Methods in Medicine , vol. 2022, 2022

work page 2022
[23]

Mel frequency cepstral coefficients for music modeling,

B. Logan, “Mel frequency cepstral coefficients for music modeling,” in In International Symposium on Music Information Retrieval . Citeseer, 2000

work page 2000
[24]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997
[25]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995

work page 1995
[26]

An Introduction to Convolutional Neural Networks

K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[27]

A logical calculus of the ideas immanent in nervous activity,

W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics , vol. 5, no. 4, pp. 115–133, 1943

work page 1943
[28]

Hubert: Self-supervised speech representation learning by masked prediction of hidden units,

W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3451–3460, 2021

work page 2021
[29]

Analysis of voice as an assisting tool for detection of parkinson’s disease and its subsequent clinical interpretation,

G. Solana-Lavalle and R. Rosas-Romero, “Analysis of voice as an assisting tool for detection of parkinson’s disease and its subsequent clinical interpretation,” Biomedical Signal Processing and Control, vol. 66, p. 102415, 2021

work page 2021
[30]

Parkinson’s disease diagnosis using machine learning and voice,

T. J. Wroge, Y. ¨Ozkanca, C. Demiroglu, D. Si, D. C. Atkins, and R. H. Ghomi, “Parkinson’s disease diagnosis using machine learning and voice,” in 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB) . IEEE, 2018, pp. 1–7

work page 2018
[31]

Attention-based hybrid cnn-lstm and spectral data augmentation for covid-19 diagnosis from cough sound,

S. Hamdi, M. Oussalah, A. Moussaoui, and M. Saidi, “Attention-based hybrid cnn-lstm and spectral data augmentation for covid-19 diagnosis from cough sound,” Journal of Intelligent Information Systems, vol. 59, no. 2, pp. 367–389, 2022

work page 2022
[32]

Exploring auditory acoustic features for the diagnosis of covid-19,

M. R. Kamble, J. Patino, M. A. Zuluaga, and M. Todisco, “Exploring auditory acoustic features for the diagnosis of covid-19,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2022, pp. 566–570. SEPTEMBER 2023 15

work page 2022
[33]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P . Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P . K. Ghosh, S. Ganapathy et al., “Coswara–a database of breathing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005
[34]

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

Respiratory health sensing from speech,

V . S. Nallanthighal, “Respiratory health sensing from speech,” Ph.D. dissertation, Amsterdam: LOT, 2022

work page 2022
[36]

Learning to forget: Continual prediction with lstm,

F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000

work page 2000
[37]

Librispeech: an asr corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2015, pp. 5206–5210

work page 2015
[38]

A novel deep learning model to detect covid-19 based on wavelet features extracted from mel-scale spectrogram of patients’ cough and breathing sounds,

M. Aly and N. S. Alotaibi, “A novel deep learning model to detect covid-19 based on wavelet features extracted from mel-scale spectrogram of patients’ cough and breathing sounds,” Informatics in Medicine Unlocked, vol. 32, p. 101049, 2022

work page 2022

[1] [1]

Covid-19 coronavirus outbreak,

Worldometer, “Covid-19 coronavirus outbreak,” 2023

work page 2023

[2] [2]

Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (covid-19): The epidemic and the challenges,

C.-C. Lai, T.-P . Shih, W.-C. Ko, H.-J. Tang, and P .-R. Hsueh, “Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and coronavirus disease-2019 (covid-19): The epidemic and the challenges,” International journal of antimicrobial agents, vol. 55, no. 3, p. 105924, 2020

work page 2019

[3] [3]

Science brief: Sars-cov-2 and surface (fomite) transmission for indoor community environments,

N. C. for Immunization et al. , “Science brief: Sars-cov-2 and surface (fomite) transmission for indoor community environments,” in CDC COVID-19 Science Briefs [Internet]. Centers for Disease Control and Prevention (US), 2021

work page 2021

[4] [4]

Covid 19 can spread through breathing, talking, study estimates,

R. Ningthoujam, “Covid 19 can spread through breathing, talking, study estimates,” Current medicine research and practice , vol. 10, no. 3, p. 132, 2020

work page 2020

[5] [5]

Sounds of covid-19: exploring realistic performance of audio-based digital testing,

J. Han, T. Xia, D. Spathis, E. Bondareva, C. Brown, J. Chauhan, T. Dang, A. Grammenos, A. Hasthanasombat, A. Floto et al. , “Sounds of covid-19: exploring realistic performance of audio-based digital testing,” NPJ digital medicine, vol. 5, no. 1, pp. 1–9, 2022

work page 2022

[6] [6]

Automatic detection of covid-19 based on short-duration acoustic smartphone speech analysis,

B. Stasak, Z. Huang, S. Razavi, D. Joachim, and J. Epps, “Automatic detection of covid-19 based on short-duration acoustic smartphone speech analysis,” Journal of Healthcare Informatics Research, vol. 5, no. 2, pp. 201–217, 2021

work page 2021

[7] [7]

Covid-19 detection system using recurrent neural networks,

A. Hassan, I. Shahin, and M. B. Alsabek, “Covid-19 detection system using recurrent neural networks,” in 2020 International conference on communications, computing, cybersecurity, and informatics (CCCI). IEEE, 2020, pp. 1–5

work page 2020

[8] [8]

Detection of covid-19 using heart rate and blood pressure: Lessons learned from patients with ards,

M. A. Mehrabadi, S. A. H. Aqajari, I. Azimi, C. A. Downs, N. Dutt, and A. M. Rahmani, “Detection of covid-19 using heart rate and blood pressure: Lessons learned from patients with ards,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2021, pp. 2140–2143. SEPTEMBER 2023 14

work page 2021

[9] [9]

Vibration feature extraction using audio spectrum analyzer based machine learning,

J.-S. Liang and K. Wang, “Vibration feature extraction using audio spectrum analyzer based machine learning,” in 2017 International conference on information, Communication and Engineering (ICICE) . IEEE, 2017, pp. 381–384

work page 2017

[10] [10]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P . Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” arXiv preprint arXiv:2006.05919, 2020

work page arXiv 2006

[11] [11]

Do you have covid-19? an artificial intelligence-based screening tool for covid-19 using acoustic parameters,

A. Vahedian-Azimi, A. Keramatfar, M. Asiaee, S. S. Atashi, and M. Nourbakhsh, “Do you have covid-19? an artificial intelligence-based screening tool for covid-19 using acoustic parameters,” The Journal of the Acoustical Society of America , vol. 150, no. 3, pp. 1945–1953, 2021

work page 1945

[12] [12]

Detection of covid-19 from voice, cough and breathing patterns: Dataset and preliminary results,

V . Despotovic, M. Ismael, M. Cornil, R. Mc Call, and G. Fagherazzi, “Detection of covid-19 from voice, cough and breathing patterns: Dataset and preliminary results,” Computers in Biology and Medicine, vol. 138, p. 104944, 2021

work page 2021

[13] [13]

Diagnostic accuracy of rapid antigen tests for covid-19 detection: a systematic review with meta-analysis,

M. Arshadi, F. Fardsanei, B. Deihim, Z. Farshadzadeh, F. Nikkhahi, F. Khalili, G. Sotgiu, A. H. Shahidi Bonjar, R. Centis, G. B. Migliori et al., “Diagnostic accuracy of rapid antigen tests for covid-19 detection: a systematic review with meta-analysis,” Frontiers in medicine , vol. 9, p. 984, 2022

work page 2022

[14] [14]

Covid-19 detection systems using deep-learning algorithms based on speech and image data,

A. B. Nassif, I. Shahin, M. Bader, A. Hassan, and N. Werghi, “Covid-19 detection systems using deep-learning algorithms based on speech and image data,” Mathematics, vol. 10, no. 4, p. 564, 2022

work page 2022

[15] [15]

Covnet: A transfer learning framework for automatic covid-19 detection from crowd-sourced cough sounds,

Y. Chang, X. Jing, Z. Ren, and B. W. Schuller, “Covnet: A transfer learning framework for automatic covid-19 detection from crowd-sourced cough sounds,” Frontiers in Digital Health, vol. 3, 2021

work page 2021

[16] [16]

Pay attention to the speech: Covid-19 diagnosis using machine learning and crowdsourced respiratory and speech recordings,

M. Aly, K. H. Rahouma, and S. M. Ramzy, “Pay attention to the speech: Covid-19 diagnosis using machine learning and crowdsourced respiratory and speech recordings,” Alexandria Engineering Journal, vol. 61, no. 5, pp. 3487–3500, 2022

work page 2022

[17] [17]

The interspeech 2021 computational paralinguistics challenge: Covid-19 cough, covid-19 speech, escalation & primates,

B. W. Schuller, A. Batliner, C. Bergler, C. Mascolo, J. Han, I. Lefter, H. Kaya, S. Amiriparian, A. Baird, L. Stappen et al., “The interspeech 2021 computational paralinguistics challenge: Covid-19 cough, covid-19 speech, escalation & primates,” arXiv preprint arXiv:2102.13468, 2021

work page arXiv 2021

[18] [18]

Voice for health: The use of vocal biomarkers from research to clinical practice,

G. Fagherazzi, A. Fischer, M. Ismael, and V . Despotovic, “Voice for health: The use of vocal biomarkers from research to clinical practice,” Digital biomarkers, vol. 5, no. 1, pp. 78–88, 2021

work page 2021

[19] [19]

Automatic diagnosis of covid-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: cough, voice, and breath,

K. K. Lella and A. Pja, “Automatic diagnosis of covid-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: cough, voice, and breath,” Alexandria Engineering Journal, vol. 61, no. 2, pp. 1319–1334, 2022

work page 2022

[20] [20]

Identify- ing individuals with recent covid-19 through voice classification using deep learning,

P . Suppakitjanusant, S. Sungkanuparph, T. Wongsinin, S. Virapongsiri, N. Kasemkosin, L. Chailurkit, and B. Ongphiphadhanakul, “Identify- ing individuals with recent covid-19 through voice classification using deep learning,” Scientific Reports, vol. 11, no. 1, pp. 1–7, 2021

work page 2021

[21] [21]

Using ai to predict service agent stress from emotion patterns in service interactions,

S. Bromuri, A. P . Henkel, D. Iren, and V . Urovi, “Using ai to predict service agent stress from emotion patterns in service interactions,”Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2021

work page 2021

[22] [22]

An analytical study of speech pathology detection based on mfcc and deep neural networks,

M. Zakariah, Y. Ajmi Alothaibi, Y. Guo, K. Tran-Trung, M. M. Elahi et al., “An analytical study of speech pathology detection based on mfcc and deep neural networks,” Computational and Mathematical Methods in Medicine , vol. 2022, 2022

work page 2022

[23] [23]

Mel frequency cepstral coefficients for music modeling,

B. Logan, “Mel frequency cepstral coefficients for music modeling,” in In International Symposium on Music Information Retrieval . Citeseer, 2000

work page 2000

[24] [24]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997

[25] [25]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995

work page 1995

[26] [26]

An Introduction to Convolutional Neural Networks

K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[27] [27]

A logical calculus of the ideas immanent in nervous activity,

W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics , vol. 5, no. 4, pp. 115–133, 1943

work page 1943

[28] [28]

Hubert: Self-supervised speech representation learning by masked prediction of hidden units,

W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3451–3460, 2021

work page 2021

[29] [29]

Analysis of voice as an assisting tool for detection of parkinson’s disease and its subsequent clinical interpretation,

G. Solana-Lavalle and R. Rosas-Romero, “Analysis of voice as an assisting tool for detection of parkinson’s disease and its subsequent clinical interpretation,” Biomedical Signal Processing and Control, vol. 66, p. 102415, 2021

work page 2021

[30] [30]

Parkinson’s disease diagnosis using machine learning and voice,

T. J. Wroge, Y. ¨Ozkanca, C. Demiroglu, D. Si, D. C. Atkins, and R. H. Ghomi, “Parkinson’s disease diagnosis using machine learning and voice,” in 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB) . IEEE, 2018, pp. 1–7

work page 2018

[31] [31]

Attention-based hybrid cnn-lstm and spectral data augmentation for covid-19 diagnosis from cough sound,

S. Hamdi, M. Oussalah, A. Moussaoui, and M. Saidi, “Attention-based hybrid cnn-lstm and spectral data augmentation for covid-19 diagnosis from cough sound,” Journal of Intelligent Information Systems, vol. 59, no. 2, pp. 367–389, 2022

work page 2022

[32] [32]

Exploring auditory acoustic features for the diagnosis of covid-19,

M. R. Kamble, J. Patino, M. A. Zuluaga, and M. Todisco, “Exploring auditory acoustic features for the diagnosis of covid-19,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2022, pp. 566–570. SEPTEMBER 2023 15

work page 2022

[33] [33]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P . Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P . K. Ghosh, S. Ganapathy et al., “Coswara–a database of breathing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005

[34] [34]

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

Respiratory health sensing from speech,

V . S. Nallanthighal, “Respiratory health sensing from speech,” Ph.D. dissertation, Amsterdam: LOT, 2022

work page 2022

[36] [36]

Learning to forget: Continual prediction with lstm,

F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000

work page 2000

[37] [37]

Librispeech: an asr corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2015, pp. 5206–5210

work page 2015

[38] [38]

A novel deep learning model to detect covid-19 based on wavelet features extracted from mel-scale spectrogram of patients’ cough and breathing sounds,

M. Aly and N. S. Alotaibi, “A novel deep learning model to detect covid-19 based on wavelet features extracted from mel-scale spectrogram of patients’ cough and breathing sounds,” Informatics in Medicine Unlocked, vol. 32, p. 101049, 2022

work page 2022