Optimising MFCC parameters for the automatic detection of respiratory diseases

Frits M.E. Franssen; Lauren Reinders; Loes van Bemmel; Sami O. Simons; Visara Urovi; Yuyang Yan

arxiv: 2408.07522 · v2 · submitted 2024-08-14 · 💻 cs.SD · cs.LG· eess.AS

Optimising MFCC parameters for the automatic detection of respiratory diseases

Yuyang Yan , Sami O. Simons , Loes van Bemmel , Lauren Reinders , Frits M.E. Franssen , Visara Urovi This is my paper

Pith reviewed 2026-05-23 21:50 UTC · model grok-4.3

classification 💻 cs.SD cs.LGeess.AS

keywords MFCC parametersrespiratory disease detectionSVM classifieracoustic biomarkersCOVID-19 sound databaseCoswara datasetSVD databaseparameter optimization

0 comments

The pith

Optimizing the number of MFCC coefficients, frame length, and hop length raises SVM accuracy for respiratory disease detection by 14.9 to 19.6 percent over worst-case parameter choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that common default settings for extracting Mel Frequency Cepstral Coefficients from voice recordings are not optimal for spotting respiratory conditions. It tests how changing the number of coefficients, the frame length, and the hop length between frames changes the performance of a support vector machine classifier across four voice datasets. The authors report clear trends, such as falling accuracy with longer hop lengths and a sweet spot near 30 coefficients, plus dataset-specific behavior for frame length. A sympathetic reader would care because voice-based screening is non-invasive and cheap, so better feature settings could make automatic detection more reliable without new hardware or models.

Core claim

By systematically varying the number of MFCC coefficients, frame length, and hop length on four respiratory sound datasets and feeding the features to an SVM classifier, the study finds that the best parameter combination reaches 81.1 percent accuracy on the Cambridge COVID-19 Sound database, 80.6 percent on Coswara, and 71.7 percent on the SVD dataset. These figures represent gains of 19.6 percent, 16.1 percent, and 14.9 percent over the worst parameter combinations tested on each dataset. Additional patterns include declining accuracy with increasing hop length, an optimum near 30 coefficients, and opposite frame-length trends between the COVID-19 datasets and the SVD set.

What carries the argument

The three MFCC extraction parameters (number of coefficients, frame length, hop length) that control the acoustic feature vectors supplied to the SVM classifier for respiratory condition labeling.

If this is right

Accuracy falls steadily as hop length grows.
Roughly 30 coefficients give the highest performance across the tested sets.
Longer frame lengths hurt results on the two COVID-19 datasets but help on the SVD dataset.
The best parameter triple outperforms the worst by double-digit percentage points on each dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If MFCC parameter choice matters this much, similar tuning sweeps could lift results in other audio-health tasks such as cough or snoring analysis.
Libraries that ship default MFCC settings may be handing researchers suboptimal features for medical sound work.
Future studies could test whether the same parameter trends appear when the classifier is swapped for a neural network.

Load-bearing premise

That the measured accuracy differences come only from the three MFCC parameters and are not produced by unstated choices in preprocessing, normalization, train-test splits, or missing statistical tests.

What would settle it

Re-running the exact same datasets and SVM with the reported best and worst parameter sets under documented cross-validation and error bars; if the accuracy gaps shrink below a few percent or reverse, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2408.07522 by Frits M.E. Franssen, Lauren Reinders, Loes van Bemmel, Sami O. Simons, Visara Urovi, Yuyang Yan.

**Figure 2.** Figure 2: Audio framing illustration Step 5. Logarithm (log). Take the logarithm of all filterbank energies to separate the excitation spectrum from the vocal system spectrum. Step 6. Discrete Cosine Transform (DCT). The Basic concept of DCT is to correlate the value of the spectrum to produce essential information about the signal structure [24]. Cm = J X−1 j=0 cos(m π J (j + 0.5))log10(Ej ), 0 ≤ m ≤ L − 1 (6) Fina… view at source ↗

**Figure 3.** Figure 3: Schematic representation of the detection system [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: , the highest accuracy for each dataset was achieved at a frame length of 25 ms. However, concerning the SVD database, the highest accuracy was also achieved at 25 ms, but exhibited an overall increasing trend from 50 ms to 500 ms, with a decrease observed at 50 ms, this trend aligns with the findings of prior work on SVD dataset conducted by Tirronen’s [20] [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Different number of coefficients 4.2 Frame length In the procedure of MFCC extraction, the first step involves dividing the origin time-domain signal into short frames, where the duration of each frame is defined as the frame length, denoted as N in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 7.** Figure 7: Different combinations with the SVM model [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 9.** Figure 9: Different combinations with the LSTM model [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 8.** Figure 8: Different combinations for different genders [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

read the original abstract

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrucken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies dataset-specific MFCC tuning numbers that can lift SVM accuracy 15 pp on respiratory tasks, but the gains rest on an unvalidated parameter search whose protocol is not described.

read the letter

The main point is that a grid search over MFCC coefficients, frame length, and hop length produces measurable accuracy lifts for SVM on three respiratory datasets, with the best reported settings reaching 81.1 %, 80.6 %, and 71.7 % and gains of roughly 15-20 pp over the worst settings. The directional patterns (accuracy falls with longer hops, ~30 coefficients look best, frame-length effects reverse between COVID and SVD data) are consistent enough across the four corpora to be worth noting for anyone already using MFCC features in this domain.

Referee Report

2 major / 1 minor

Summary. The paper claims that MFCC parameters (number of coefficients, frame length, hop length) substantially affect SVM accuracy for respiratory disease detection. It reports directional effects (accuracy falls with larger hop length; ~30 coefficients optimal; frame-length effects differ by dataset) and that an optimized triple yields 19.6 pp, 16.1 pp and 14.9 pp gains (to 81.1 %, 80.6 % and 71.7 %) on the Cambridge COVID-19, Coswara and SVD datasets relative to the worst triple examined.

Significance. If the optimization procedure is shown to be robust, the empirical results across three datasets would supply practical guidance on MFCC tuning for acoustic respiratory biomarkers. The direct measurement on held-out data and the reporting of consistent directional trends are positive features; however, the absence of any validation protocol for selecting the reported optimum currently prevents the quantitative gains from being treated as reliable.

major comments (2)

[Abstract] Abstract: the headline claim of 19.6/16.1/14.9 pp improvements from an 'optimized combination' is presented without any description of how that combination was identified (held-out validation split, nested CV, grid-search protocol, or multiple-testing correction). This is load-bearing because the largest accuracy observed on a finite grid evaluated on the same data used for selection is expected to be upward-biased, directly undermining the reported gains.
[Abstract] Abstract / Results: no information is supplied on the train-test split protocol, cross-validation procedure, number of runs, or statistical significance of the accuracy differences. Without these controls it is impossible to determine whether the observed differences arise solely from the three MFCC parameters or are confounded by unstated preprocessing, normalization or split choices.

minor comments (1)

[Abstract] The abstract mentions a fourth dataset (TACTICAS) but reports quantitative results only for three; a brief statement of its outcome or reason for omission would improve completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on experimental reporting. We agree that the abstract and results lack necessary protocol details and will revise the manuscript to include them. The directional trends (hop length, coefficient count, dataset-dependent frame length) remain the core contribution and are unaffected by the reporting gaps.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of 19.6/16.1/14.9 pp improvements from an 'optimized combination' is presented without any description of how that combination was identified (held-out validation split, nested CV, grid-search protocol, or multiple-testing correction). This is load-bearing because the largest accuracy observed on a finite grid evaluated on the same data used for selection is expected to be upward-biased, directly undermining the reported gains.

Authors: We agree the selection procedure must be stated explicitly. The optimized triple was obtained by exhaustive enumeration of the three-parameter grid and selection of the combination that maximized accuracy on a fixed held-out test partition (stratified 80/20 split, same partition used for all reported triples). We will revise the abstract to read: 'Using grid search over the MFCC parameter space and evaluation on a held-out test set, the best triple yields accuracies of 81.1 %, 80.6 % and 71.7 % ...' We will also add a short discussion of selection bias and report mean accuracy plus standard deviation across five independent random splits to quantify variability. revision: yes
Referee: [Abstract] Abstract / Results: no information is supplied on the train-test split protocol, cross-validation procedure, number of runs, or statistical significance of the accuracy differences. Without these controls it is impossible to determine whether the observed differences arise solely from the three MFCC parameters or are confounded by unstated preprocessing, normalization or split choices.

Authors: We acknowledge the omission. All experiments used a consistent stratified 80/20 train-test split with no data leakage; SVM hyperparameters were tuned via 5-fold cross-validation on the training portion only. We will insert a concise 'Experimental Protocol' paragraph in the methods and results sections stating the split ratio, that the identical split was reused across all MFCC triples, the number of random seeds (five), and paired t-test p-values for the accuracy differences. This will confirm that the reported gains are attributable to the MFCC parameters under controlled conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical grid search over MFCC parameters evaluated directly on datasets

full rationale

The paper conducts an experimental study: it extracts MFCC features under varying (n_coeffs, frame_len, hop_len) combinations, trains SVM classifiers on four respiratory-disease datasets, and reports measured accuracies. No equations, derivations, or self-referential definitions appear; the reported improvements (19.6 pp, 16.1 pp, 14.9 pp) are direct empirical outcomes on the data, not quantities forced by construction or by self-citation chains. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 0 invented entities

The claim rests on standard supervised-learning assumptions plus the empirical observation that accuracy varies with the three MFCC settings; no new entities or ad-hoc axioms are introduced.

free parameters (3)

number of MFCC coefficients
Empirically identified optimum of approximately 30; value chosen after testing to maximize reported accuracy.
frame length
Dataset-specific values between 50 ms and 500 ms selected to optimize performance.
hop length
Shorter values preferred after observing accuracy decline with larger hops.

axioms (1)

domain assumption SVM is an effective and widely adopted classifier for audio-based respiratory classification
Stated justification for classifier choice; no comparison to alternatives provided.

pith-pipeline@v0.9.0 · 5872 in / 1380 out tokens · 53605 ms · 2026-05-23T21:50:54.626519+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

Sheffield: European Respiratory Society, 2017

Forum of International Respiratory Societies, The Global Impact of Respiratory Disease Second Edition. Sheffield: European Respiratory Society, 2017

work page 2017
[2]

The evolution of human speech: The role of enhanced breathing control,

A. M. MacLarnon and G. P . Hewitt, “The evolution of human speech: The role of enhanced breathing control,” American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists , vol. 109, no. 3, pp. 341–363, 1999

work page 1999
[3]

Preliminary observation of speech disorder in obstructive and mixed sleep apnea,

P . K. Monoson and A. W. Fox, “Preliminary observation of speech disorder in obstructive and mixed sleep apnea,” Chest, vol. 92, no. 4, pp. 670–675, 1987

work page 1987
[4]

Detection of covid-19 through the analysis of vocal fold oscillations,

M. Al Ismail, S. Deshmukh, and R. Singh, “Detection of covid-19 through the analysis of vocal fold oscillations,” in ICASSP 2021- 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 1035–1039

work page 2021
[5]

Taking connected mobile-health diagnostics of infectious diseases to the field,

C. S. Wood, M. R. Thomas, J. Budd, T. P . Mashamba-Thompson, K. Herbst, D. Pillay, R. W. Peeling, A. M. Johnson, R. A. McKendry, and M. M. Stevens, “Taking connected mobile-health diagnostics of infectious diseases to the field,” Nature, vol. 566, no. 7745, pp. 467–474, 2019

work page 2019
[6]

Towards an artificial intelli- gence framework for data-driven prediction of coronavirus clinical severity,

X. Jiang, M. Coffee, A. Bari, J. Wang, X. Jiang, J. Huang, J. Shi, J. Dai, J. Cai, T. Zhang et al. , “Towards an artificial intelli- gence framework for data-driven prediction of coronavirus clinical severity,” Computers, Materials & Continua , vol. 63, no. 1, pp. 537– 551, 2020

work page 2020
[7]

Fast deep learning computer-aided diagnosis of covid-19 based on digital chest x-ray images,

M. A. Al-Antari, C.-H. Hua, J. Bang, and S. Lee, “Fast deep learning computer-aided diagnosis of covid-19 based on digital chest x-ray images,” Applied Intelligence, vol. 51, no. 5, pp. 2890– 2907, 2021

work page 2021
[8]

Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues,

T. Xia, J. Han, and C. Mascolo, “Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues,” Experimental Biology and Medicine, vol. 247, no. 22, pp. 2053–2061, 2022

work page 2053
[9]

Resapp technology to diagnose and manage respira- tory disease,

T. Keating, “Resapp technology to diagnose and manage respira- tory disease,” Australasian Biotechnology, vol. 25, no. 1, p. 16, 2015

work page 2015
[10]

Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the sars-cov-2 corona crisis,

B. W. Schuller, D. M. Schuller, K. Qian, J. Liu, H. Zheng, and X. Li, “Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the sars-cov-2 corona crisis,” Frontiers in digital health, vol. 3, p. 564906, 2021

work page 2021
[11]

Smartphone apps in the covid-19 pandemic,

J. A. Pandit, J. M. Radin, G. Quer, and E. J. Topol, “Smartphone apps in the covid-19 pandemic,” Nature Biotechnology , vol. 40, no. 7, pp. 1013–1022, 2022

work page 2022
[12]

Voice disor- ders in severe obstructive sleep apnea patients and comparison of two acoustic analysis software programs: Mdvp and praat,

M. Wei, J. Du, X. Wang, H. Lu, W. Wang, and P . Lin, “Voice disor- ders in severe obstructive sleep apnea patients and comparison of two acoustic analysis software programs: Mdvp and praat,” Sleep and Breathing, vol. 25, pp. 433–439, 2021

work page 2021
[13]

Respiratory disease classifi- cation by cnn using mfcc,

K. Mridha, S. Sarkar, and D. Kumar, “Respiratory disease classifi- cation by cnn using mfcc,” in 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA) . IEEE, 2021, pp. 517–523

work page 2021
[14]

Aenet: Learning deep audio features for video analysis,

N. Takahashi, M. Gygli, and L. Van Gool, “Aenet: Learning deep audio features for video analysis,”IEEE Transactions on Multimedia, vol. 20, no. 3, pp. 513–524, 2017

work page 2017
[15]

Feature ex- traction of some quranic recitation using mel-frequency cepstral coeficients (mfcc),

M. Bezoui, A. Elmoutaouakkil, and A. Beni-hssane, “Feature ex- traction of some quranic recitation using mel-frequency cepstral coeficients (mfcc),” in 2016 5th international conference on multimedia computing and systems (ICMCS). IEEE, 2016, pp. 127–131

work page 2016
[16]

Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences,

S. Davis and P . Mermelstein, “Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357–366, 1980

work page 1980
[17]

Using ai to predict service agent stress from emotion patterns in service interactions,

S. Bromuri, A. P . Henkel, D. Iren, and V . Urovi, “Using ai to predict service agent stress from emotion patterns in service interactions,” Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2021

work page 2021
[18]

Learnable mfccs for speaker verification,

X. Liu, M. Sahidullah, and T. Kinnunen, “Learnable mfccs for speaker verification,” in 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021, pp. 1–5

work page 2021
[19]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P . Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” arXiv preprint arXiv:2006.05919, 2020

work page arXiv 2006
[20]

The effect of the mfcc frame length in automatic voice pathology detection,

S. Tirronen, S. R. Kadiri, and P . Alku, “The effect of the mfcc frame length in automatic voice pathology detection,” Journal of Voice , 2022

work page 2022
[21]

An analytical study of speech pathology detection based on mfcc and deep neural networks,

M. Zakariah, Y. Ajmi Alotaibi, Y. Guo, K. Tran-Trung, M. M. Elahi et al., “An analytical study of speech pathology detection based on mfcc and deep neural networks,” Computational and Mathematical Methods in Medicine, vol. 2022, 2022

work page 2022
[22]

Mechanomyography-based muscle fatigue detection during elec- trically elicited cycling in patients with spinal cord injury,

J. Naeem, N. A. Hamzaid, M. A. Islam, A. W. Azman, and M. Bijak, “Mechanomyography-based muscle fatigue detection during elec- trically elicited cycling in patients with spinal cord injury,”Medical & biological engineering & computing, vol. 57, pp. 1199–1211, 2019

work page 2019
[23]

Feature extrac- tion using mfcc,

S. Gupta, J. Jaafar, W. W. Ahmad, and A. Bansal, “Feature extrac- tion using mfcc,” Signal & Image Processing: An International Journal, vol. 4, no. 4, pp. 101–108, 2013

work page 2013
[24]

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

L. Muda, M. Begam, and I. Elamvazuthi, “Voice recogni- tion algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques,” arXiv preprint arXiv:1003.4083, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010
[25]

Heart sound as a biometric,

K. Phua, J. Chen, T. H. Dat, and L. Shue, “Heart sound as a biometric,” Pattern recognition, vol. 41, no. 3, pp. 906–919, 2008

work page 2008
[26]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P . Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P . K. Ghosh, S. Ganapathy et al. , “Coswara–a database of breathing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005
[27]

Voice pathology detection on the saarbr ¨ucken voice database with calibration and fusion of scores using multifocal toolkit,

D. Mart ´ınez, E. Lleida, A. Ortega, A. Miguel, and J. Villalba, “Voice pathology detection on the saarbr ¨ucken voice database with calibration and fusion of scores using multifocal toolkit,” in Advances in Speech and Language Technologies for Iberian Languages: IberSPEECH 2012 Conference, Madrid, Spain, November 21-23, 2012. Proceedings. Springer, 2012,...

work page 2012
[28]

Telemonitoring for asthma and copd through voice analysis: the tacticas study

“Telemonitoring for asthma and copd through voice analysis: the tacticas study.” [Online]. Available: https: //onderzoekmetmensen.nl/en/trial/27652

work page
[29]

Measuring respiratory symptoms of copd: performance of the exact-respiratory symp- toms tool (e-rs) in three clinical trials,

N. K. Leidy, L. T. Murray, B. U. Monz, L. Nelsen, M. Goldman, P . W. Jones, E. J. Dansie, and S. Sethi, “Measuring respiratory symptoms of copd: performance of the exact-respiratory symp- toms tool (e-rs) in three clinical trials,” Respiratory Research, vol. 15, no. 1, pp. 1–10, 2014

work page 2014
[30]

Outlier detection: how to threshold outlier scores?

J. Yang, S. Rahardja, and P . Fr ¨anti, “Outlier detection: how to threshold outlier scores?” in Proceedings of the international confer- ence on artificial intelligence, information processing and cloud comput- ing, 2019, pp. 1–6

work page 2019
[31]

Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals,

A. Chowdhury and A. Ross, “Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals,” IEEE transactions on information forensics and security , vol. 15, pp. 1616–1629, 2019

work page 2019
[32]

Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,

E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Oth- mani, “Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,” Biomedical Signal Processing and Control, vol. 71, p. 103107, 2022

work page 2022
[33]

Spafe: Simplified python audio features extraction,

A. Malek, “Spafe: Simplified python audio features extraction,” Journal of Open Source Software, vol. 8, no. 81, p. 4739, 2023

work page 2023
[34]

Scikit-learn,

O. Kramer and O. Kramer, “Scikit-learn,” Machine learning for evolution strategies, pp. 45–53, 2016

work page 2016
[35]

A comprehensive survey on support vector machine classification: Applications, challenges and trends,

J. Cervantes, F. Garcia-Lamont, L. Rodr ´ıguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomput- ing, vol. 408, pp. 189–215, 2020

work page 2020
[36]

At- lstm: An attention-based lstm model for financial time series pre- diction,

X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, and B. Wu, “At- lstm: An attention-based lstm model for financial time series pre- diction,” in IOP Conference Series: Materials Science and Engineering, vol. 569, no. 5. IOP Publishing, 2019, p. 052037

work page 2019
[37]

Healthcare techniques through deep learning: issues, challenges and opportunities,

R. Amin, M. A. Al Ghamdi, S. H. Almotiri, M. Alruily et al. , “Healthcare techniques through deep learning: issues, challenges and opportunities,” IEEE Access, vol. 9, pp. 98 523–98 541, 2021

work page 2021
[38]

Predictions for covid- 19 with deep learning models of lstm, gru and bi-lstm,

F. Shahid, A. Zameer, and M. Muneeb, “Predictions for covid- 19 with deep learning models of lstm, gru and bi-lstm,” Chaos, Solitons & Fractals, vol. 140, p. 110212, 2020

work page 2020
[39]

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Y. Yuyang, W. Aljbawi, S. O. Simmons, and V . Urovi, “Developing a multi-variate prediction model for covid-19 from crowd-sourced respiratory voice data,” arXiv:2402.07619, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

Spafe.features.mfcc,

Spafe Documentation, “Spafe.features.mfcc,” 2019, copyright 2019. https://spafe.readthedocs.io/en/latest/features/mfcc.html

work page 2019
[41]

Features for content-based audio retrieval,

D. Mitrovi ´c, M. Zeppelzauer, and C. Breiteneder, “Features for content-based audio retrieval,” in Advances in computers. Elsevier, 2010, vol. 78, pp. 71–150

work page 2010

[1] [1]

Sheffield: European Respiratory Society, 2017

Forum of International Respiratory Societies, The Global Impact of Respiratory Disease Second Edition. Sheffield: European Respiratory Society, 2017

work page 2017

[2] [2]

The evolution of human speech: The role of enhanced breathing control,

A. M. MacLarnon and G. P . Hewitt, “The evolution of human speech: The role of enhanced breathing control,” American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists , vol. 109, no. 3, pp. 341–363, 1999

work page 1999

[3] [3]

Preliminary observation of speech disorder in obstructive and mixed sleep apnea,

P . K. Monoson and A. W. Fox, “Preliminary observation of speech disorder in obstructive and mixed sleep apnea,” Chest, vol. 92, no. 4, pp. 670–675, 1987

work page 1987

[4] [4]

Detection of covid-19 through the analysis of vocal fold oscillations,

M. Al Ismail, S. Deshmukh, and R. Singh, “Detection of covid-19 through the analysis of vocal fold oscillations,” in ICASSP 2021- 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 1035–1039

work page 2021

[5] [5]

Taking connected mobile-health diagnostics of infectious diseases to the field,

C. S. Wood, M. R. Thomas, J. Budd, T. P . Mashamba-Thompson, K. Herbst, D. Pillay, R. W. Peeling, A. M. Johnson, R. A. McKendry, and M. M. Stevens, “Taking connected mobile-health diagnostics of infectious diseases to the field,” Nature, vol. 566, no. 7745, pp. 467–474, 2019

work page 2019

[6] [6]

Towards an artificial intelli- gence framework for data-driven prediction of coronavirus clinical severity,

X. Jiang, M. Coffee, A. Bari, J. Wang, X. Jiang, J. Huang, J. Shi, J. Dai, J. Cai, T. Zhang et al. , “Towards an artificial intelli- gence framework for data-driven prediction of coronavirus clinical severity,” Computers, Materials & Continua , vol. 63, no. 1, pp. 537– 551, 2020

work page 2020

[7] [7]

Fast deep learning computer-aided diagnosis of covid-19 based on digital chest x-ray images,

M. A. Al-Antari, C.-H. Hua, J. Bang, and S. Lee, “Fast deep learning computer-aided diagnosis of covid-19 based on digital chest x-ray images,” Applied Intelligence, vol. 51, no. 5, pp. 2890– 2907, 2021

work page 2021

[8] [8]

Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues,

T. Xia, J. Han, and C. Mascolo, “Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues,” Experimental Biology and Medicine, vol. 247, no. 22, pp. 2053–2061, 2022

work page 2053

[9] [9]

Resapp technology to diagnose and manage respira- tory disease,

T. Keating, “Resapp technology to diagnose and manage respira- tory disease,” Australasian Biotechnology, vol. 25, no. 1, p. 16, 2015

work page 2015

[10] [10]

Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the sars-cov-2 corona crisis,

B. W. Schuller, D. M. Schuller, K. Qian, J. Liu, H. Zheng, and X. Li, “Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the sars-cov-2 corona crisis,” Frontiers in digital health, vol. 3, p. 564906, 2021

work page 2021

[11] [11]

Smartphone apps in the covid-19 pandemic,

J. A. Pandit, J. M. Radin, G. Quer, and E. J. Topol, “Smartphone apps in the covid-19 pandemic,” Nature Biotechnology , vol. 40, no. 7, pp. 1013–1022, 2022

work page 2022

[12] [12]

Voice disor- ders in severe obstructive sleep apnea patients and comparison of two acoustic analysis software programs: Mdvp and praat,

M. Wei, J. Du, X. Wang, H. Lu, W. Wang, and P . Lin, “Voice disor- ders in severe obstructive sleep apnea patients and comparison of two acoustic analysis software programs: Mdvp and praat,” Sleep and Breathing, vol. 25, pp. 433–439, 2021

work page 2021

[13] [13]

Respiratory disease classifi- cation by cnn using mfcc,

K. Mridha, S. Sarkar, and D. Kumar, “Respiratory disease classifi- cation by cnn using mfcc,” in 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA) . IEEE, 2021, pp. 517–523

work page 2021

[14] [14]

Aenet: Learning deep audio features for video analysis,

N. Takahashi, M. Gygli, and L. Van Gool, “Aenet: Learning deep audio features for video analysis,”IEEE Transactions on Multimedia, vol. 20, no. 3, pp. 513–524, 2017

work page 2017

[15] [15]

Feature ex- traction of some quranic recitation using mel-frequency cepstral coeficients (mfcc),

M. Bezoui, A. Elmoutaouakkil, and A. Beni-hssane, “Feature ex- traction of some quranic recitation using mel-frequency cepstral coeficients (mfcc),” in 2016 5th international conference on multimedia computing and systems (ICMCS). IEEE, 2016, pp. 127–131

work page 2016

[16] [16]

Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences,

S. Davis and P . Mermelstein, “Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357–366, 1980

work page 1980

[17] [17]

Using ai to predict service agent stress from emotion patterns in service interactions,

S. Bromuri, A. P . Henkel, D. Iren, and V . Urovi, “Using ai to predict service agent stress from emotion patterns in service interactions,” Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2021

work page 2021

[18] [18]

Learnable mfccs for speaker verification,

X. Liu, M. Sahidullah, and T. Kinnunen, “Learnable mfccs for speaker verification,” in 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021, pp. 1–5

work page 2021

[19] [19]

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,

C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P . Cicuta, and C. Mascolo, “Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data,” arXiv preprint arXiv:2006.05919, 2020

work page arXiv 2006

[20] [20]

The effect of the mfcc frame length in automatic voice pathology detection,

S. Tirronen, S. R. Kadiri, and P . Alku, “The effect of the mfcc frame length in automatic voice pathology detection,” Journal of Voice , 2022

work page 2022

[21] [21]

An analytical study of speech pathology detection based on mfcc and deep neural networks,

M. Zakariah, Y. Ajmi Alotaibi, Y. Guo, K. Tran-Trung, M. M. Elahi et al., “An analytical study of speech pathology detection based on mfcc and deep neural networks,” Computational and Mathematical Methods in Medicine, vol. 2022, 2022

work page 2022

[22] [22]

Mechanomyography-based muscle fatigue detection during elec- trically elicited cycling in patients with spinal cord injury,

J. Naeem, N. A. Hamzaid, M. A. Islam, A. W. Azman, and M. Bijak, “Mechanomyography-based muscle fatigue detection during elec- trically elicited cycling in patients with spinal cord injury,”Medical & biological engineering & computing, vol. 57, pp. 1199–1211, 2019

work page 2019

[23] [23]

Feature extrac- tion using mfcc,

S. Gupta, J. Jaafar, W. W. Ahmad, and A. Bansal, “Feature extrac- tion using mfcc,” Signal & Image Processing: An International Journal, vol. 4, no. 4, pp. 101–108, 2013

work page 2013

[24] [24]

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

L. Muda, M. Begam, and I. Elamvazuthi, “Voice recogni- tion algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques,” arXiv preprint arXiv:1003.4083, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010

[25] [25]

Heart sound as a biometric,

K. Phua, J. Chen, T. H. Dat, and L. Shue, “Heart sound as a biometric,” Pattern recognition, vol. 41, no. 3, pp. 906–919, 2008

work page 2008

[26] [26]

Coswara–a database of breath- ing, cough, and voice sounds for covid-19 diagnosis,

N. Sharma, P . Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P . K. Ghosh, S. Ganapathy et al. , “Coswara–a database of breathing, cough, and voice sounds for covid-19 diagnosis,” arXiv preprint arXiv:2005.10548, 2020

work page arXiv 2005

[27] [27]

Voice pathology detection on the saarbr ¨ucken voice database with calibration and fusion of scores using multifocal toolkit,

D. Mart ´ınez, E. Lleida, A. Ortega, A. Miguel, and J. Villalba, “Voice pathology detection on the saarbr ¨ucken voice database with calibration and fusion of scores using multifocal toolkit,” in Advances in Speech and Language Technologies for Iberian Languages: IberSPEECH 2012 Conference, Madrid, Spain, November 21-23, 2012. Proceedings. Springer, 2012,...

work page 2012

[28] [28]

Telemonitoring for asthma and copd through voice analysis: the tacticas study

“Telemonitoring for asthma and copd through voice analysis: the tacticas study.” [Online]. Available: https: //onderzoekmetmensen.nl/en/trial/27652

work page

[29] [29]

Measuring respiratory symptoms of copd: performance of the exact-respiratory symp- toms tool (e-rs) in three clinical trials,

N. K. Leidy, L. T. Murray, B. U. Monz, L. Nelsen, M. Goldman, P . W. Jones, E. J. Dansie, and S. Sethi, “Measuring respiratory symptoms of copd: performance of the exact-respiratory symp- toms tool (e-rs) in three clinical trials,” Respiratory Research, vol. 15, no. 1, pp. 1–10, 2014

work page 2014

[30] [30]

Outlier detection: how to threshold outlier scores?

J. Yang, S. Rahardja, and P . Fr ¨anti, “Outlier detection: how to threshold outlier scores?” in Proceedings of the international confer- ence on artificial intelligence, information processing and cloud comput- ing, 2019, pp. 1–6

work page 2019

[31] [31]

Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals,

A. Chowdhury and A. Ross, “Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals,” IEEE transactions on information forensics and security , vol. 15, pp. 1616–1629, 2019

work page 2019

[32] [32]

Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,

E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Oth- mani, “Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,” Biomedical Signal Processing and Control, vol. 71, p. 103107, 2022

work page 2022

[33] [33]

Spafe: Simplified python audio features extraction,

A. Malek, “Spafe: Simplified python audio features extraction,” Journal of Open Source Software, vol. 8, no. 81, p. 4739, 2023

work page 2023

[34] [34]

Scikit-learn,

O. Kramer and O. Kramer, “Scikit-learn,” Machine learning for evolution strategies, pp. 45–53, 2016

work page 2016

[35] [35]

A comprehensive survey on support vector machine classification: Applications, challenges and trends,

J. Cervantes, F. Garcia-Lamont, L. Rodr ´ıguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomput- ing, vol. 408, pp. 189–215, 2020

work page 2020

[36] [36]

At- lstm: An attention-based lstm model for financial time series pre- diction,

X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, and B. Wu, “At- lstm: An attention-based lstm model for financial time series pre- diction,” in IOP Conference Series: Materials Science and Engineering, vol. 569, no. 5. IOP Publishing, 2019, p. 052037

work page 2019

[37] [37]

Healthcare techniques through deep learning: issues, challenges and opportunities,

R. Amin, M. A. Al Ghamdi, S. H. Almotiri, M. Alruily et al. , “Healthcare techniques through deep learning: issues, challenges and opportunities,” IEEE Access, vol. 9, pp. 98 523–98 541, 2021

work page 2021

[38] [38]

Predictions for covid- 19 with deep learning models of lstm, gru and bi-lstm,

F. Shahid, A. Zameer, and M. Muneeb, “Predictions for covid- 19 with deep learning models of lstm, gru and bi-lstm,” Chaos, Solitons & Fractals, vol. 140, p. 110212, 2020

work page 2020

[39] [39]

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Y. Yuyang, W. Aljbawi, S. O. Simmons, and V . Urovi, “Developing a multi-variate prediction model for covid-19 from crowd-sourced respiratory voice data,” arXiv:2402.07619, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[40] [40]

Spafe.features.mfcc,

Spafe Documentation, “Spafe.features.mfcc,” 2019, copyright 2019. https://spafe.readthedocs.io/en/latest/features/mfcc.html

work page 2019

[41] [41]

Features for content-based audio retrieval,

D. Mitrovi ´c, M. Zeppelzauer, and C. Breiteneder, “Features for content-based audio retrieval,” in Advances in computers. Elsevier, 2010, vol. 78, pp. 71–150

work page 2010