pith. sign in

arxiv: 2110.06123 · v3 · pith:EWVEJV2Gnew · submitted 2021-10-12 · 💻 cs.SD · eess.AS

COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation

Pith reviewed 2026-05-24 12:34 UTC · model grok-4.3

classification 💻 cs.SD eess.AS
keywords COVID-19 diagnosiscough acousticsConvNetdata augmentationMFCC featuresDiCOVA challengeacoustic classificationmachine learning
0
0 comments X

The pith

A ConvNet on MFCC features from cough recordings detects COVID-19 at 87.07 percent AUC-ROC after data augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that subtle differences in cough acoustics can be used by statistical models to separate COVID-19 positive from negative cases. A convolutional neural network processes Mel frequency cepstral coefficients extracted from the DiCOVA 2021 Track 1 dataset of sound recordings. The base ConvNet reaches 72.23 percent AUC-ROC on the blind test set. Adding data augmentation raises performance to 87.07 percent, which exceeds the challenge baseline by 23 percent and places the model first on the leaderboard.

Core claim

The ConvNet model achieved an AUC score percentage of 72.23 on the blind test set provided by the same for an unbiased evaluation of the models. The ConvNet model incorporated with Data Augmentation further increased the AUC-ROC percentage from 72.23 to 87.07. It also outperformed the DiCOVA 2021 Challenge's baseline model by 23 percent thus, claiming the top position on the DiCOVA 2021 Challenge leaderboard. This paper proposes the use of Mel frequency cepstral coefficients as the feature input for the proposed model.

What carries the argument

ConvNet classifier that takes Mel frequency cepstral coefficients from cough recordings as input and applies data augmentation during training.

If this is right

  • Cough sound recordings contain information that allows statistical models to classify COVID-19 status above chance.
  • Data augmentation improves the performance of acoustic ConvNets when training data are limited.
  • MFCC features paired with convolutional layers form an effective pipeline for this classification task.
  • The reported model sets the highest score among submissions to the DiCOVA 2021 Track 1 leaderboard.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same acoustic pipeline could be tested on other respiratory conditions to check whether the learned patterns are COVID-specific.
  • Deployment would require checking whether performance holds when recordings come from different microphones or environments.
  • Combining the acoustic score with other low-cost signals might raise overall screening accuracy without adding contact.

Load-bearing premise

Cough acoustics contain consistent, generalizable differences between COVID-19 positive and negative individuals that the chosen features and model can capture on the DiCOVA dataset distribution.

What would settle it

A new independent collection of cough recordings on which the augmented ConvNet achieves an AUC-ROC at or below the DiCOVA baseline level.

Figures

Figures reproduced from arXiv: 2110.06123 by Darsh Kaushik, Hoang Van Truong, Koushik Guha, Saranga Kingkor Mahanta, Shubham Jain.

Figure 1
Figure 1. Figure 1: Extracting Cepstral Coefficients from an audio signal [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Class distribution of the augmented dataset [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of duration of audio samples in the dataset [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Proposed CNN architecture architectures, including our proposed model and the baseline model, are shown in Table I. The baseline model of the DiCOVA challenge used a Random Forest classifier trained with fifty trees. The participants were instructed to submit the output probabilities corresponding to each sound recording name. The blind test set was provided in the DiCOVA 2021 challenge, along with the dat… view at source ↗
Figure 5
Figure 5. Figure 5: ROC curves depicting model performance on each fold [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Averaged model decisions computed at 80% sensitivity [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
read the original abstract

With the periodic rise and fall of COVID-19 and countries being inflicted by its waves, an efficient, economic, and effortless diagnosis procedure for the virus has been the utmost need of the hour. COVID-19 positive individuals may even be asymptomatic making the diagnosis difficult, but amongst the infected subjects, the asymptomatic ones need not be entirely free of symptoms caused by the virus. They might not show any observable symptoms like the symptomatic subjects, but they may differ from uninfected ones in the way they cough. These differences in the coughing sounds are minute and indiscernible to the human ear, however, these can be captured using machine learning-based statistical models. In this paper, we present a deep learning approach to analyze the acoustic dataset provided in Track 1 of the DiCOVA 2021 Challenge containing cough sound recordings belonging to both COVID-19 positive and negative examples. To perform the classification on the sound recordings as belonging to a COVID-19 positive or negative examples, we propose a ConvNet model. Our model achieved an AUC score percentage of 72.23 on the blind test set provided by the same for an unbiased evaluation of the models. The ConvNet model incorporated with Data Augmentation further increased the AUC-ROC percentage from 72.23 to 87.07. It also outperformed the DiCOVA 2021 Challenge's baseline model by 23% thus, claiming the top position on the DiCOVA 2021 Challenge leaderboard. This paper proposes the use of Mel frequency cepstral coefficients as the feature input for the proposed model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a ConvNet model that takes MFCC features extracted from cough recordings as input to classify samples as COVID-19 positive or negative. On the blind test set of DiCOVA 2021 Challenge Track 1 the model obtains an AUC-ROC of 72.23 %; the addition of data augmentation raises this to 87.07 %, outperforming the challenge baseline by 23 % and placing first on the leaderboard.

Significance. If the performance numbers prove robust under proper validation protocols and external testing, the work would illustrate that standard audio pipelines can achieve competitive results on a public cough-based COVID benchmark. The absence of methodological transparency, however, currently prevents any assessment of whether the reported lift reflects genuine acoustic biomarkers or dataset-specific covariates.

major comments (3)
  1. [Abstract] Abstract: The performance claims rest on AUC-ROC figures of 72.23 % and 87.07 % yet supply no information whatsoever on dataset cardinality, train-test split ratios, cross-validation scheme, stratification by device or demographics, or statistical testing. These omissions are load-bearing for the central generalization claim.
  2. [Abstract] Abstract: No description is given of the augmentation operations (type, parameters, or whether they were applied only to training data), making it impossible to determine whether the 15-point AUC gain arises from improved generalization or from unintended correlation with label distribution or test-set covariates.
  3. [Abstract] Abstract: The claim that the model “outperformed the DiCOVA 2021 Challenge’s baseline model by 23 %” and “claim[ed] the top position” cannot be evaluated without confirmation that the evaluation protocol exactly matches the challenge rules and that no post-hoc tuning on the blind test set occurred.
minor comments (1)
  1. [Abstract] The repeated phrase “AUC score percentage” is redundant; AUC-ROC is already expressed as a percentage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed comments on the abstract. We address each point below and will revise the abstract to incorporate additional methodological details for improved transparency while preserving its conciseness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The performance claims rest on AUC-ROC figures of 72.23 % and 87.07 % yet supply no information whatsoever on dataset cardinality, train-test split ratios, cross-validation scheme, stratification by device or demographics, or statistical testing. These omissions are load-bearing for the central generalization claim.

    Authors: The DiCOVA 2021 Track 1 challenge supplies a fixed training set and a blind test set for evaluation; no cross-validation or custom splits were used. Dataset cardinality and participant details are described in the manuscript body. Device and demographic stratification were not performed because the challenge data release did not include such metadata. We will add a brief statement to the abstract noting the use of the official challenge split and the absence of additional validation protocols. revision: yes

  2. Referee: [Abstract] Abstract: No description is given of the augmentation operations (type, parameters, or whether they were applied only to training data), making it impossible to determine whether the 15-point AUC gain arises from improved generalization or from unintended correlation with label distribution or test-set covariates.

    Authors: The augmentation pipeline (time stretching, pitch shifting, and additive noise) and its parameters are specified in the Methods section and were applied exclusively to the training partition using a validation subset drawn from the training data. We will insert a short clause in the abstract summarizing the augmentation strategy and confirming it was training-only. revision: yes

  3. Referee: [Abstract] Abstract: The claim that the model “outperformed the DiCOVA 2021 Challenge’s baseline model by 23 %” and “claim[ed] the top position” cannot be evaluated without confirmation that the evaluation protocol exactly matches the challenge rules and that no post-hoc tuning on the blind test set occurred.

    Authors: All submissions, including ours, were evaluated by the challenge organizers on the held-out blind test set via their official platform; test labels were never released to participants. The reported improvement and leaderboard position therefore reflect strict adherence to the published challenge protocol with no post-hoc tuning. We will append a confirming clause to the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: standard ML pipeline on blind test set with no self-referential derivations or fitted predictions.

full rationale

The paper reports training a ConvNet on MFCC features extracted from the DiCOVA 2021 cough dataset, with data augmentation, and evaluates AUC on the provided blind test set. No equations, parameter-fitting steps, uniqueness theorems, or self-citations are described that would reduce the reported performance lift (72.23 to 87.07) to a construction or post-hoc fit on the target metric. The result is presented as an empirical outcome on held-out data rather than a derived identity, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view supplies no explicit free parameters, axioms, or invented entities; the approach implicitly assumes that MFCC features are sufficient and that the challenge data distribution matches real-world use.

pith-pipeline@v0.9.0 · 5839 in / 1062 out tokens · 26378 ms · 2026-05-24T12:34:25.005229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    World Health Organization, March 2020

  2. [2]

    Diagnosing covid-19: the disease and tools for detection,

    B. Udugama, P. Kadhiresan, H. N. Kozlowski, A. Malekjahani, M. Osborne, V . Y . C. Li, H. Chen, S. Mubareka, J. B. Gubbay, and W. C. W. Chan, “Diagnosing covid-19: the disease and tools for detection,” American Chemical Society Public Health Emergency Collection, 2020

  3. [3]

    Report of the who-china joint mission on coronavirus disease 2019 (covid-19),

    World Health Organization, “Report of the who-china joint mission on coronavirus disease 2019 (covid-19),” February 2020. [Online]. Avail- able: https://www.who.int/docs/default-source/coronaviruse/who-china- joint-mission-on-covid-19-final-report.pdf

  4. [5]

    Convolutional networks and applications in vision,

    Y . LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proceedings of 2010 IEEE international symposium on circuits and systems.IEEE, 2010, pp. 253—256

  5. [6]

    Covid-19 artificial intelligence diagnosis using only cough recordings,

    J. Laguarta, F. Hueto, and B. Subirana, “Covid-19 artificial intelligence diagnosis using only cough recordings,” IEEE Open Journal of Engi- neering in Medicine and Biology, vol. 1, pp. 275—281, 2020

  6. [7]

    Covid-19 detection system using recurrent neural networks,

    A. Hassan, I. Shahin, and M. B. Alsabek, “Covid-19 detection system using recurrent neural networks,” in 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), 2020, pp. 1—5

  7. [8]

    Cough against covid: evidence of covid-19 signature in cough sounds,

    P. Bagad, A. Dalmia, J. Doshi, A. Nagrani, P. Bhamare, A. Mahale, S. Rane, N. Agarwal, and R. Panicker, “Cough against covid: evidence of covid-19 signature in cough sounds,” 2020

  8. [9]

    Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,

    A. Imran, I. Posokhova, H. N. Qureshi, U. Masood, M. S. Riaz, K. Ali, C. N. John, M. I. Hussain, and M. Nabeel, “Ai4covid-19: ai enabled preliminary diagnosis for covid-19 from cough samples via an app,” Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020. [Online]. Available: http://dx.doi.org/10.1016/j.imu.2020.100378

  9. [10]

    Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

    C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & DataMining, ser. KDD ’20. New York, NY , USA: Association for Computing M...

  10. [11]

    Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,

    G. Chaudhari, X. Jiang, A. Fakhry, A. Han, J. Xiao, S. Shen, and A. Khanzada, “Virufy: global applicability of crowdsourced and clinical datasets for ai detection of covid-19 from cough,” 2021

  11. [12]

    Virufy: a multi-branch deep learning network for automated detection of covid-19,

    A. Fakhry, X. Jiang, J. Xiao, G. Chaudhari, A. Han, and A. Khan-zada, “Virufy: a multi-branch deep learning network for automated detection of covid-19,” 2021

  12. [13]

    From frequency to quefrency: A history of the cepstrum,

    A. V . Oppenheim and R. W. Schafer, “From frequency to quefrency: A history of the cepstrum,” IEEE signal processing Magazine, vol. 21, no. 5, pp. 95–106, 2004

  13. [14]

    Speech recog- nition using mfcc,

    C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, “Speech recog- nition using mfcc,” in International conference on computer graphics, simulation and modeling, 2012, pp. 135—138

  14. [15]

    A scale for the measurement of the psychological magnitude pitch,

    S. S. Stevens, J. V olkmann, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The journal of the acoustical society of america, vol. 8, no. 3, pp. 185—190, 1937

  15. [16]

    Wavelet analysis of voluntary cough sound in patients with respiratory diseases,

    J. Knocikova, J. Korpas, M. Vrabec, and M. Javorka, “Wavelet analysis of voluntary cough sound in patients with respiratory diseases,” J Physiol Pharmacol, vol. 59, no. Suppl 6, pp. 331—40, 2008

  16. [17]

    Librosa: audio and music signal analysis in python,

    B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: audio and music signal analysis in python,” in Proceedings of the 14th python in science conference, vol. 8 .Citeseer, 2015, pp. 18—25

  17. [18]

    On the Origin of Deep Learning

    H. Wang and B. Raj, “On the origin of deep learning,” arXiv preprint arXiv:1702.07800, 2017

  18. [19]

    Long short-term memory recurrent neural network architectures for large scale acoustic modeling,

    H. Sak, A. W. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” International Speech Communication Association, 2014

  19. [20]

    Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

    N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P. K. Ghosh, S. Ganapathyet al., “Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

  20. [21]

    Distribution balanced stratified cross val- idation for accuracy estimation,

    X. Zeng and T. R. Martinez, “Distribution balanced stratified cross val- idation for accuracy estimation,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 12, no. 1, pp. 1—12, 2000

  21. [22]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  22. [23]

    The use of the area under the roc curve in the evaluation of machine learning algorithms,

    A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145—1159, 1997

  23. [24]

    Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,

    H. V . Truong, N. C. Hieu, P. N. Giao, and N. X. Phong, “Un- supervised detection of anomalous sound for machine condition monitoring using fully connected u-net,” Journal of ICT Research and Applications, vol. 15, pp. 41—55, 2021. [Online]. Available: http://journals.itb.ac.id/index.php/jictra/article/view/15353

  24. [25]

    ”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)

    , Ananya, et al. ”DiCOV A Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics.” arXiv preprint arXiv:2103.09148 (2021)