COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation

Darsh Kaushik; Hoang Van Truong; Koushik Guha; Saranga Kingkor Mahanta; Shubham Jain

arxiv: 2110.06123 · v3 · pith:EWVEJV2Gnew · submitted 2021-10-12 · 💻 cs.SD · eess.AS

COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation

Saranga Kingkor Mahanta , Darsh Kaushik , Shubham Jain , Hoang Van Truong , Koushik Guha This is my paper

Pith reviewed 2026-05-24 12:34 UTC · model grok-4.3

classification 💻 cs.SD eess.AS

keywords COVID-19 diagnosiscough acousticsConvNetdata augmentationMFCC featuresDiCOVA challengeacoustic classificationmachine learning

0 comments

The pith

A ConvNet on MFCC features from cough recordings detects COVID-19 at 87.07 percent AUC-ROC after data augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that subtle differences in cough acoustics can be used by statistical models to separate COVID-19 positive from negative cases. A convolutional neural network processes Mel frequency cepstral coefficients extracted from the DiCOVA 2021 Track 1 dataset of sound recordings. The base ConvNet reaches 72.23 percent AUC-ROC on the blind test set. Adding data augmentation raises performance to 87.07 percent, which exceeds the challenge baseline by 23 percent and places the model first on the leaderboard.

Core claim

The ConvNet model achieved an AUC score percentage of 72.23 on the blind test set provided by the same for an unbiased evaluation of the models. The ConvNet model incorporated with Data Augmentation further increased the AUC-ROC percentage from 72.23 to 87.07. It also outperformed the DiCOVA 2021 Challenge's baseline model by 23 percent thus, claiming the top position on the DiCOVA 2021 Challenge leaderboard. This paper proposes the use of Mel frequency cepstral coefficients as the feature input for the proposed model.

What carries the argument

ConvNet classifier that takes Mel frequency cepstral coefficients from cough recordings as input and applies data augmentation during training.

If this is right

Cough sound recordings contain information that allows statistical models to classify COVID-19 status above chance.
Data augmentation improves the performance of acoustic ConvNets when training data are limited.
MFCC features paired with convolutional layers form an effective pipeline for this classification task.
The reported model sets the highest score among submissions to the DiCOVA 2021 Track 1 leaderboard.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same acoustic pipeline could be tested on other respiratory conditions to check whether the learned patterns are COVID-specific.
Deployment would require checking whether performance holds when recordings come from different microphones or environments.
Combining the acoustic score with other low-cost signals might raise overall screening accuracy without adding contact.

Load-bearing premise

Cough acoustics contain consistent, generalizable differences between COVID-19 positive and negative individuals that the chosen features and model can capture on the DiCOVA dataset distribution.

What would settle it

A new independent collection of cough recordings on which the augmented ConvNet achieves an AUC-ROC at or below the DiCOVA baseline level.

Figures

Figures reproduced from arXiv: 2110.06123 by Darsh Kaushik, Hoang Van Truong, Koushik Guha, Saranga Kingkor Mahanta, Shubham Jain.

**Figure 3.** Figure 3: Class distribution of the augmented dataset [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: Distribution of duration of audio samples in the dataset [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Proposed CNN architecture architectures, including our proposed model and the baseline model, are shown in Table I. The baseline model of the DiCOVA challenge used a Random Forest classifier trained with fifty trees. The participants were instructed to submit the output probabilities corresponding to each sound recording name. The blind test set was provided in the DiCOVA 2021 challenge, along with the dat… view at source ↗

**Figure 5.** Figure 5: ROC curves depicting model performance on each fold [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Averaged model decisions computed at 80% sensitivity [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

read the original abstract

With the periodic rise and fall of COVID-19 and countries being inflicted by its waves, an efficient, economic, and effortless diagnosis procedure for the virus has been the utmost need of the hour. COVID-19 positive individuals may even be asymptomatic making the diagnosis difficult, but amongst the infected subjects, the asymptomatic ones need not be entirely free of symptoms caused by the virus. They might not show any observable symptoms like the symptomatic subjects, but they may differ from uninfected ones in the way they cough. These differences in the coughing sounds are minute and indiscernible to the human ear, however, these can be captured using machine learning-based statistical models. In this paper, we present a deep learning approach to analyze the acoustic dataset provided in Track 1 of the DiCOVA 2021 Challenge containing cough sound recordings belonging to both COVID-19 positive and negative examples. To perform the classification on the sound recordings as belonging to a COVID-19 positive or negative examples, we propose a ConvNet model. Our model achieved an AUC score percentage of 72.23 on the blind test set provided by the same for an unbiased evaluation of the models. The ConvNet model incorporated with Data Augmentation further increased the AUC-ROC percentage from 72.23 to 87.07. It also outperformed the DiCOVA 2021 Challenge's baseline model by 23% thus, claiming the top position on the DiCOVA 2021 Challenge leaderboard. This paper proposes the use of Mel frequency cepstral coefficients as the feature input for the proposed model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard ConvNet on MFCCs with augmentation reaches top spot on DiCOVA 2021 cough leaderboard at 87 AUC, but the abstract supplies almost no experimental controls so the jump from 72 cannot be evaluated.

read the letter

The paper takes a conventional ConvNet, feeds it MFCC features from the DiCOVA Track 1 cough recordings, and shows that data augmentation lifts AUC from 72.23 to 87.07 on the blind test set while beating the challenge baseline by 23 percent and taking first place. That is the concrete result on offer. The augmentation step is the only clear addition, and the size of the lift indicates it addressed whatever was limiting the base model on this data. Beyond that the work applies established audio classification tools to a new but publicly available benchmark; no new architecture, loss, or derivation is introduced. The main limitation is the absence of any description of dataset size, how the train-test split was made, whether folds or stratification were used, or checks for device, age, or gender effects. Without those details the reported numbers cannot be assessed for overfitting or hidden covariate leakage, which directly touches the stress-test concern about generalizability. The abstract alone does not let us tell whether the model learned COVID-specific acoustics or something narrower about the DiCOVA collection process. Readers already working on the DiCOVA challenge or similar audio-health tasks will find the leaderboard number useful as a reference point. Anyone seeking evidence that cough acoustics yield a robust, device-independent screen will need the full methods section before drawing that conclusion. The paper should go to peer review so the experimental controls can be examined; the result is competitive enough on a public benchmark to justify referee time even if the approach is incremental.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a ConvNet model that takes MFCC features extracted from cough recordings as input to classify samples as COVID-19 positive or negative. On the blind test set of DiCOVA 2021 Challenge Track 1 the model obtains an AUC-ROC of 72.23 %; the addition of data augmentation raises this to 87.07 %, outperforming the challenge baseline by 23 % and placing first on the leaderboard.

Significance. If the performance numbers prove robust under proper validation protocols and external testing, the work would illustrate that standard audio pipelines can achieve competitive results on a public cough-based COVID benchmark. The absence of methodological transparency, however, currently prevents any assessment of whether the reported lift reflects genuine acoustic biomarkers or dataset-specific covariates.

major comments (3)

[Abstract] Abstract: The performance claims rest on AUC-ROC figures of 72.23 % and 87.07 % yet supply no information whatsoever on dataset cardinality, train-test split ratios, cross-validation scheme, stratification by device or demographics, or statistical testing. These omissions are load-bearing for the central generalization claim.
[Abstract] Abstract: No description is given of the augmentation operations (type, parameters, or whether they were applied only to training data), making it impossible to determine whether the 15-point AUC gain arises from improved generalization or from unintended correlation with label distribution or test-set covariates.
[Abstract] Abstract: The claim that the model “outperformed the DiCOVA 2021 Challenge’s baseline model by 23 %” and “claim[ed] the top position” cannot be evaluated without confirmation that the evaluation protocol exactly matches the challenge rules and that no post-hoc tuning on the blind test set occurred.

minor comments (1)

[Abstract] The repeated phrase “AUC score percentage” is redundant; AUC-ROC is already expressed as a percentage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed comments on the abstract. We address each point below and will revise the abstract to incorporate additional methodological details for improved transparency while preserving its conciseness.

read point-by-point responses

Referee: [Abstract] Abstract: The performance claims rest on AUC-ROC figures of 72.23 % and 87.07 % yet supply no information whatsoever on dataset cardinality, train-test split ratios, cross-validation scheme, stratification by device or demographics, or statistical testing. These omissions are load-bearing for the central generalization claim.

Authors: The DiCOVA 2021 Track 1 challenge supplies a fixed training set and a blind test set for evaluation; no cross-validation or custom splits were used. Dataset cardinality and participant details are described in the manuscript body. Device and demographic stratification were not performed because the challenge data release did not include such metadata. We will add a brief statement to the abstract noting the use of the official challenge split and the absence of additional validation protocols. revision: yes
Referee: [Abstract] Abstract: No description is given of the augmentation operations (type, parameters, or whether they were applied only to training data), making it impossible to determine whether the 15-point AUC gain arises from improved generalization or from unintended correlation with label distribution or test-set covariates.

Authors: The augmentation pipeline (time stretching, pitch shifting, and additive noise) and its parameters are specified in the Methods section and were applied exclusively to the training partition using a validation subset drawn from the training data. We will insert a short clause in the abstract summarizing the augmentation strategy and confirming it was training-only. revision: yes
Referee: [Abstract] Abstract: The claim that the model “outperformed the DiCOVA 2021 Challenge’s baseline model by 23 %” and “claim[ed] the top position” cannot be evaluated without confirmation that the evaluation protocol exactly matches the challenge rules and that no post-hoc tuning on the blind test set occurred.

Authors: All submissions, including ours, were evaluated by the challenge organizers on the held-out blind test set via their official platform; test labels were never released to participants. The reported improvement and leaderboard position therefore reflect strict adherence to the published challenge protocol with no post-hoc tuning. We will append a confirming clause to the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: standard ML pipeline on blind test set with no self-referential derivations or fitted predictions.

full rationale

The paper reports training a ConvNet on MFCC features extracted from the DiCOVA 2021 cough dataset, with data augmentation, and evaluates AUC on the provided blind test set. No equations, parameter-fitting steps, uniqueness theorems, or self-citations are described that would reduce the reported performance lift (72.23 to 87.07) to a construction or post-hoc fit on the target metric. The result is presented as an empirical outcome on held-out data rather than a derived identity, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view supplies no explicit free parameters, axioms, or invented entities; the approach implicitly assumes that MFCC features are sufficient and that the challenge data distribution matches real-world use.

pith-pipeline@v0.9.0 · 5839 in / 1062 out tokens · 26378 ms · 2026-05-24T12:34:25.005229+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

World Health Organization, March 2020

work page 2020
[2]

Diagnosing covid-19: the disease and tools for detection,

B. Udugama, P. Kadhiresan, H. N. Kozlowski, A. Malekjahani, M. Osborne, V . Y . C. Li, H. Chen, S. Mubareka, J. B. Gubbay, and W. C. W. Chan, “Diagnosing covid-19: the disease and tools for detection,” American Chemical Society Public Health Emergency Collection, 2020

work page 2020
[3]

Report of the who-china joint mission on coronavirus disease 2019 (covid-19),

World Health Organization, “Report of the who-china joint mission on coronavirus disease 2019 (covid-19),” February 2020. [Online]. Avail- able: https://www.who.int/docs/default-source/coronaviruse/who-china- joint-mission-on-covid-19-ﬁnal-report.pdf

work page 2019
[5]

Convolutional networks and applications in vision,

Y . LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proceedings of 2010 IEEE international symposium on circuits and systems.IEEE, 2010, pp. 253—256

work page 2010
[6]

Covid-19 artiﬁcial intelligence diagnosis using only cough recordings,

J. Laguarta, F. Hueto, and B. Subirana, “Covid-19 artiﬁcial intelligence diagnosis using only cough recordings,” IEEE Open Journal of Engi- neering in Medicine and Biology, vol. 1, pp. 275—281, 2020

work page 2020
[7]

Covid-19 detection system using recurrent neural networks,

A. Hassan, I. Shahin, and M. B. Alsabek, “Covid-19 detection system using recurrent neural networks,” in 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), 2020, pp. 1—5

work page 2020
[8]

Cough against covid: evidence of covid-19 signature in cough sounds,

P. Bagad, A. Dalmia, J. Doshi, A. Nagrani, P. Bhamare, A. Mahale, S. Rane, N. Agarwal, and R. Panicker, “Cough against covid: evidence of covid-19 signature in cough sounds,” 2020

work page 2020
[9]

Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,

A. Imran, I. Posokhova, H. N. Qureshi, U. Masood, M. S. Riaz, K. Ali, C. N. John, M. I. Hussain, and M. Nabeel, “Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,” Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020. [Online]. Available: http://dx.doi.org/10.1016/j.imu.2020.100378

work page doi:10.1016/j.imu.2020.100378 2020
[10]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & DataMining, ser. KDD ’20. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3394486.3412865 2020
[11]

Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,

G. Chaudhari, X. Jiang, A. Fakhry, A. Han, J. Xiao, S. Shen, and A. Khanzada, “Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,” 2021

work page 2021
[12]

Virufy: a multi-branch deep learning network for automated detection of covid-19,

A. Fakhry, X. Jiang, J. Xiao, G. Chaudhari, A. Han, and A. Khan-zada, “Virufy: a multi-branch deep learning network for automated detection of covid-19,” 2021

work page 2021
[13]

From frequency to quefrency: A history of the cepstrum,

A. V . Oppenheim and R. W. Schafer, “From frequency to quefrency: A history of the cepstrum,” IEEE signal processing Magazine, vol. 21, no. 5, pp. 95–106, 2004

work page 2004
[14]

Speech recog- nition using mfcc,

C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, “Speech recog- nition using mfcc,” in International conference on computer graphics, simulation and modeling, 2012, pp. 135—138

work page 2012
[15]

A scale for the measurement of the psychological magnitude pitch,

S. S. Stevens, J. V olkmann, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The journal of the acoustical society of america, vol. 8, no. 3, pp. 185—190, 1937

work page 1937
[16]

Wavelet analysis of voluntary cough sound in patients with respiratory diseases,

J. Knocikova, J. Korpas, M. Vrabec, and M. Javorka, “Wavelet analysis of voluntary cough sound in patients with respiratory diseases,” J Physiol Pharmacol, vol. 59, no. Suppl 6, pp. 331—40, 2008

work page 2008
[17]

Librosa: audio and music signal analysis in python,

B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: audio and music signal analysis in python,” in Proceedings of the 14th python in science conference, vol. 8 .Citeseer, 2015, pp. 18—25

work page 2015
[18]

On the Origin of Deep Learning

H. Wang and B. Raj, “On the origin of deep learning,” arXiv preprint arXiv:1702.07800, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Long short-term memory recurrent neural network architectures for large scale acoustic modeling,

H. Sak, A. W. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” International Speech Communication Association, 2014

work page 2014
[20]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P. K. Ghosh, S. Ganapathyet al., “Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005
[21]

Distribution balanced stratiﬁed cross val- idation for accuracy estimation,

X. Zeng and T. R. Martinez, “Distribution balanced stratiﬁed cross val- idation for accuracy estimation,” Journal of Experimental & Theoretical Artiﬁcial Intelligence, vol. 12, no. 1, pp. 1—12, 2000

work page 2000
[22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

The use of the area under the roc curve in the evaluation of machine learning algorithms,

A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145—1159, 1997

work page 1997
[24]

Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,

H. V . Truong, N. C. Hieu, P. N. Giao, and N. X. Phong, “Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,” Journal of ICT Research and Applications, vol. 15, pp. 41—55, 2021. [Online]. Available: http://journals.itb.ac.id/index.php/jictra/article/view/15353

work page 2021
[25]

”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)

, Ananya, et al. ”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)

work page arXiv 2021

[1] [1]

World Health Organization, March 2020

work page 2020

[2] [2]

Diagnosing covid-19: the disease and tools for detection,

B. Udugama, P. Kadhiresan, H. N. Kozlowski, A. Malekjahani, M. Osborne, V . Y . C. Li, H. Chen, S. Mubareka, J. B. Gubbay, and W. C. W. Chan, “Diagnosing covid-19: the disease and tools for detection,” American Chemical Society Public Health Emergency Collection, 2020

work page 2020

[3] [3]

Report of the who-china joint mission on coronavirus disease 2019 (covid-19),

World Health Organization, “Report of the who-china joint mission on coronavirus disease 2019 (covid-19),” February 2020. [Online]. Avail- able: https://www.who.int/docs/default-source/coronaviruse/who-china- joint-mission-on-covid-19-ﬁnal-report.pdf

work page 2019

[4] [5]

Convolutional networks and applications in vision,

Y . LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proceedings of 2010 IEEE international symposium on circuits and systems.IEEE, 2010, pp. 253—256

work page 2010

[5] [6]

Covid-19 artiﬁcial intelligence diagnosis using only cough recordings,

J. Laguarta, F. Hueto, and B. Subirana, “Covid-19 artiﬁcial intelligence diagnosis using only cough recordings,” IEEE Open Journal of Engi- neering in Medicine and Biology, vol. 1, pp. 275—281, 2020

work page 2020

[6] [7]

Covid-19 detection system using recurrent neural networks,

A. Hassan, I. Shahin, and M. B. Alsabek, “Covid-19 detection system using recurrent neural networks,” in 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), 2020, pp. 1—5

work page 2020

[7] [8]

Cough against covid: evidence of covid-19 signature in cough sounds,

P. Bagad, A. Dalmia, J. Doshi, A. Nagrani, P. Bhamare, A. Mahale, S. Rane, N. Agarwal, and R. Panicker, “Cough against covid: evidence of covid-19 signature in cough sounds,” 2020

work page 2020

[8] [9]

Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,

A. Imran, I. Posokhova, H. N. Qureshi, U. Masood, M. S. Riaz, K. Ali, C. N. John, M. I. Hussain, and M. Nabeel, “Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,” Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020. [Online]. Available: http://dx.doi.org/10.1016/j.imu.2020.100378

work page doi:10.1016/j.imu.2020.100378 2020

[9] [10]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & DataMining, ser. KDD ’20. New York, NY , USA: Association for Computing M...

work page doi:10.1145/3394486.3412865 2020

[10] [11]

Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,

G. Chaudhari, X. Jiang, A. Fakhry, A. Han, J. Xiao, S. Shen, and A. Khanzada, “Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,” 2021

work page 2021

[11] [12]

Virufy: a multi-branch deep learning network for automated detection of covid-19,

A. Fakhry, X. Jiang, J. Xiao, G. Chaudhari, A. Han, and A. Khan-zada, “Virufy: a multi-branch deep learning network for automated detection of covid-19,” 2021

work page 2021

[12] [13]

From frequency to quefrency: A history of the cepstrum,

A. V . Oppenheim and R. W. Schafer, “From frequency to quefrency: A history of the cepstrum,” IEEE signal processing Magazine, vol. 21, no. 5, pp. 95–106, 2004

work page 2004

[13] [14]

Speech recog- nition using mfcc,

C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, “Speech recog- nition using mfcc,” in International conference on computer graphics, simulation and modeling, 2012, pp. 135—138

work page 2012

[14] [15]

A scale for the measurement of the psychological magnitude pitch,

S. S. Stevens, J. V olkmann, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The journal of the acoustical society of america, vol. 8, no. 3, pp. 185—190, 1937

work page 1937

[15] [16]

Wavelet analysis of voluntary cough sound in patients with respiratory diseases,

J. Knocikova, J. Korpas, M. Vrabec, and M. Javorka, “Wavelet analysis of voluntary cough sound in patients with respiratory diseases,” J Physiol Pharmacol, vol. 59, no. Suppl 6, pp. 331—40, 2008

work page 2008

[16] [17]

Librosa: audio and music signal analysis in python,

B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: audio and music signal analysis in python,” in Proceedings of the 14th python in science conference, vol. 8 .Citeseer, 2015, pp. 18—25

work page 2015

[17] [18]

On the Origin of Deep Learning

H. Wang and B. Raj, “On the origin of deep learning,” arXiv preprint arXiv:1702.07800, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [19]

Long short-term memory recurrent neural network architectures for large scale acoustic modeling,

H. Sak, A. W. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” International Speech Communication Association, 2014

work page 2014

[19] [20]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P. K. Ghosh, S. Ganapathyet al., “Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005

[20] [21]

Distribution balanced stratiﬁed cross val- idation for accuracy estimation,

X. Zeng and T. R. Martinez, “Distribution balanced stratiﬁed cross val- idation for accuracy estimation,” Journal of Experimental & Theoretical Artiﬁcial Intelligence, vol. 12, no. 1, pp. 1—12, 2000

work page 2000

[21] [22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [23]

The use of the area under the roc curve in the evaluation of machine learning algorithms,

A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145—1159, 1997

work page 1997

[23] [24]

Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,

H. V . Truong, N. C. Hieu, P. N. Giao, and N. X. Phong, “Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,” Journal of ICT Research and Applications, vol. 15, pp. 41—55, 2021. [Online]. Available: http://journals.itb.ac.id/index.php/jictra/article/view/15353

work page 2021

[24] [25]

”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)

, Ananya, et al. ”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)

work page arXiv 2021