pith. sign in

arxiv: 1907.06129 · v1 · pith:S67QZSI6new · submitted 2019-07-13 · 💻 cs.SD · cs.LG· eess.AS

Towards Robust Voice Pathology Detection

Pith reviewed 2026-05-24 21:37 UTC · model grok-4.3

classification 💻 cs.SD cs.LGeess.AS
keywords voice pathology detectionacoustic featuresMFCCXGBoostDenseNetIsolation Forestsustained phonationvowel /a/
0
0 comments X

The pith

Merging four databases of vowel phonations lets XGBoost reach F1 0.733 for pathology detection using acoustic features and MFCCs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether classifiers from supervised learning and anomaly detection can identify pathological voices from acoustic recordings of the sustained vowel /a/. It combines four separate databases containing both normophonic and pathological samples without limiting the set of pathologies. Experiments evaluate raw waveforms, spectrograms, MFCCs, and conventional acoustic features across XGBoost, DenseNet, and Isolation Forest. The work presents itself as the first to pool these databases and to apply gradient boosting and deep learning in this setting. The highest F1 score on a dedicated test set is 0.733 with XGBoost, indicating that the combined data and chosen methods can support more general detection than prior single-database studies.

Core claim

Merging four independent databases of normophonic and pathological sustained /a/ phonations and evaluating gradient boosted trees, deep networks, and anomaly detection produces the following best F1 scores on a held-out test set: 0.733 for XGBoost using acoustic features together with MFCCs, 0.621 for DenseNet using MFCCs, and 0.610 for Isolation Forest using acoustic features.

What carries the argument

The merged corpus of four databases of sustained vowel /a/ recordings, represented by acoustic (dysphonic) features and MFCCs, supplied to XGBoost, DenseNet, and Isolation Forest.

If this is right

  • XGBoost with the combination of acoustic features and MFCCs delivers the highest supervised classification performance among the three methods tested.
  • DenseNet on MFCCs achieves moderate results that could scale with additional data volume.
  • Isolation Forest reaches comparable performance without needing labeled pathological examples in training.
  • Pooling multiple databases expands coverage of recording conditions and pathology variety beyond any single prior study.
  • The reported scores establish a baseline showing that gradient boosting and deep learning are viable for objective voice pathology screening.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Performance on entirely new recording hardware or patient populations outside the four databases would need separate validation to confirm robustness.
  • The advantage of combining acoustic features with MFCCs suggests that traditional dysphonic measures add information not captured by spectral coefficients alone.
  • Extending the same pipeline to other vowels or running speech could test whether results depend on the /a/ phonation task.
  • Integration into clinical workflows would require prospective studies that track whether the classifier output correlates with treatment outcomes.

Load-bearing premise

The four databases can be merged into a single training and test distribution without significant domain shift from differing recording equipment, labeling criteria, or patient demographics, and that the dedicated test set split avoids leakage while remaining representative of unseen pathologies.

What would settle it

Training on three of the databases and testing on the fourth would produce a large drop in F1 score if domain shift between recording conditions prevents generalization.

Figures

Figures reproduced from arXiv: 1907.06129 by Jesus B. Alonso-Hernandez, Jiri Mekyska, Pavol Harar, Radim Burget, Zdenek Smekal, Zoltan Galaz.

Figure 1
Figure 1. Figure 1: Visualization of inequality of samples per vocal pathology in the datasets used in this work (only 5 most [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to be the first to combine four heterogeneous databases of normophonic and pathological sustained /a/ phonations (unrestricted to specific vocal pathologies) and the first to apply gradient boosted trees (XGBoost) and deep learning (DenseNet) to voice pathology detection. It reports best F1 scores on a dedicated test set of 0.733 (XGBoost with AF+MFCC), 0.621 (DenseNet with MFCC), and 0.610 (Isolation Forest with AF), while framing the work as exploratory.

Significance. If the performance claims hold after addressing validation details, the work would be significant as an early demonstration of gradient boosting and CNNs on a large multi-database corpus for non-invasive voice pathology detection. The combination of four databases is a clear strength if domain shift and leakage are controlled; the exploratory framing and use of modern methods (XGBoost, DenseNet) on acoustic features and MFCCs add value over prior single-database studies.

major comments (2)
  1. [Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.
  2. [Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.
minor comments (1)
  1. [Abstract] Abstract: the phrasing 'conducted experiments do show promising potential' is slightly awkward and could be tightened for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that additional methodological context is warranted and will revise the abstract accordingly while preserving the paper's exploratory framing.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.

    Authors: We acknowledge that the abstract omits key experimental details. The full manuscript specifies a patient-level train/test split across the four databases to avoid speaker leakage, with hyperparameter selection performed via cross-validation on the training portion. Statistical significance testing and error bars were omitted due to the exploratory nature of the study. We will revise the abstract to include a concise description of the patient-level dedicated test set and hyperparameter tuning procedure. revision: yes

  2. Referee: [Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.

    Authors: The manuscript already describes the work as exploratory and does not assert that domain shift has been eliminated. We will revise the abstract to clarify that the four-database combination increases diversity but that no cross-database normalization or domain-adversarial training was applied, and that all splits are performed at the patient level. This will temper the robustness claim without altering the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML results on held-out data

full rationale

The paper reports F1 scores from training XGBoost, DenseNet, and Isolation Forest on AF/MFCC features extracted from four merged voice databases, evaluated on a dedicated test set. No equations, first-principles derivations, fitted parameters relabeled as predictions, or self-citation chains appear in the abstract or described methodology. All performance numbers are direct measurements from standard supervised and anomaly-detection pipelines; the work is self-contained against external benchmarks with no reduction of claims to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical machine learning study on audio classification; no explicit free parameters beyond standard model hyperparameters, no domain axioms, and no invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5774 in / 1242 out tokens · 40185 ms · 2026-05-24T21:37:27.384824+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 3 internal anchors

  1. [1]

    In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp

    Al-nasheri, A., Ali, Z., Muhammad, G., Alsulaiman, M.: Voice pathology detection using auto-correlation of differ- ent filters bank. In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp. 50–55. IEEE (2014)

  2. [2]

    Journal of Voice 31(1), 3–15 (2017)

    Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z.: Investigation of voice pathology detection and classifica- tion on different frequency regions using correlation func- tions. Journal of Voice 31(1), 3–15 (2017)

  3. [3]

    IEEE Ac- cess PP(99), 1–1 (2017)

    Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., Farahat, M.: Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Ac- cess PP(99), 1–1 (2017). DOI 10.1109/ACCESS.2017. 2696056

  4. [4]

    Journal of Voice 31(1), 113–e9 (2017)

    Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Bencherif, M.A.: An investigation of multidimensional voice pro- gram parameters in three different databases for voice pathology detection and classification. Journal of Voice 31(1), 113–e9 (2017)

  5. [5]

    Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra-and inter-database study for arabic, english, and german databases: Do conventional speech features detect voice pathology? Journal of Voice 31(3), 386–e1 (2017)

  6. [6]

    IEEE Access 5, 3900–3908 (2017)

    Ali, Z., Muhammad, G., Alhamid, M.F.: An automatic health monitoring system for patients suffering from voice complications in smart cities. IEEE Access 5, 3900–3908 (2017)

  7. [7]

    Computers & Electrical Engineering 57, 257–265 (2017)

    Amami, R., Smiti, A.: An incremental method combining density clustering and support vector machines for voice pathology detection. Computers & Electrical Engineering 57, 257–265 (2017)

  8. [8]

    Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

    Arias-Londo˜ no, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modula- tion spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

  9. [9]

    Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

    Armstrong, D., Gosling, A., Weinman, J., Marteau, T.: The place of inter-rater reliability in qualitative research: an empirical study. Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

  10. [10]

    Journal of Neural Transmission 124(3), 303–334 (2017)

    Brabenec, L., Mekyska, J., Galaz, Z., Rektorova, I.: Speech disorders in parkinsons disease: early diagnostics and effects of medication and brain stimulation. Journal of Neural Transmission 124(3), 303–334 (2017)

  11. [11]

    Machine learning 45(1), 5–32 (2001)

    Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

  12. [12]

    In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp

    Chen, T., Guestrin, C.: Xgboost: A scalable tree boost- ing system. In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp. 785–794. ACM (2016)

  13. [13]

    URL: https://keras.io/ (2015)

    Chollet, F., et al.: Keras: Deep learning library for theano and tensorflow. URL: https://keras.io/ (2015)

  14. [14]

    In: Systems and Control (ICSC), 2017 6th International Conference on, pp

    Dahmani, M., Guerti, M.: Vocal folds pathologies clas- sification using na¨ ıve bayes networks. In: Systems and Control (ICSC), 2017 6th International Conference on, pp. 426–432. IEEE (2017)

  15. [15]

    Journal of Voice 11(1), 74–80 (1997)

    De Bodt, M.S., Wuyts, F.L., Van de Heyning, P.H., Croux, C.: Test-retest study of the grbas scale: influence of experience and professional background on perceptual rating of voice quality. Journal of Voice 11(1), 74–80 (1997)

  16. [16]

    Eur Arch Otorhinolaryngol

    Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Otorhinolaryngol. 258(2), 77–82 (2001)

  17. [17]

    Computational and mathematical methods in medicine 2015 (2015)

    Eskidere, ¨O., G¨ urhanlı, A.: Voice disorder classification based on multitaper mel frequency cepstral coefficients features. Computational and mathematical methods in medicine 2015 (2015)

  18. [18]

    1.03 (cd-rom)

    Eye, M., Infirmary, E.: Voice disorders database, version. 1.03 (cd-rom). Lincoln Park, NJ: Kay Elemetrics Corpo- ration (1994)

  19. [19]

    J Speech Hear

    Gerratt, B.R., Kreiman, J., Antonanzas-Barroso, N., Berke, G.S.: Comparing internal and external standards in voice quality judgments. J Speech Hear. Res. 36(1), 14–20 (1993)

  20. [20]

    Journal of Voice 24(6), 667– 677 (2010)

    Godino-Llorente, J.I., G´ omez-Vilda, P., Cruz-Rold´ an, F., Blanco-Velasco, M., Fraile, R.: Pathological likelihood in- dex as a measurement of the degree of voice normality and perceived hoarseness. Journal of Voice 24(6), 667– 677 (2010)

  21. [21]

    Advanced Analytics, LLC (2014)

    Gwet, K.L.: Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC (2014)

  22. [22]

    In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp

    Harar, P., Alonso-Hernandezy, J.B., Mekyska, J., Galaz, Z., Burget, R., Smekal, Z.: Voice pathology detection us- ing deep learning: a preliminary study. In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp. 1–4. IEEE (2017)

  23. [23]

    Journal of the Royal Statis- tical Society

    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k- means clustering algorithm. Journal of the Royal Statis- tical Society. Series C (Applied Statistics)28(1), 100–108 (1979)

  24. [24]

    IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

    Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

  25. [25]

    In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp

    Hemmerling, D.: Voice pathology distinction using au- toassociative neural networks. In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp. 1844–1847. IEEE (2017)

  26. [26]

    Computers in biology and medicine 69, 270–276 (2016)

    Hemmerling, D., Skalski, A., Gajda, J.: Voice data mining for laryngeal pathology assessment. Computers in biology and medicine 69, 270–276 (2016)

  27. [27]

    J Speech Hear Res 39(2), 311–21 (1996)

    Hillenbrand, J., Houde, R.A.: Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. J Speech Hear Res 39(2), 311–21 (1996)

  28. [28]

    IEEE Access 4, 7806– 7815 (2016)

    Hossain, M.S., Muhammad, G.: Healthcare big data voice pathology assessment framework. IEEE Access 4, 7806– 7815 (2016)

  29. [29]

    Densely Connected Convolutional Networks

    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)

  30. [30]

    Neurocomputing 70(1), 489–501 (2006)

    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learn- ing machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

  31. [31]

    Adam: A Method for Stochastic Optimization

    Kingma, D., Ba, J.: Adam: A method for stochastic op- timization. arXiv preprint arXiv:1412.6980 (2014)

  32. [32]

    J Speech Hear

    Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., Berke, G.S.: Perceptual evaluation of voice quality: re- view, tutorial, and a framework for future research. J Speech Hear. Res. 36(1), 21–40 (1993)

  33. [33]

    IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

    Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemon- itoring of Parkinson’s disease. IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

  34. [34]

    In: Data Mining, 2008

    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pp. 413–422. IEEE (2008)

  35. [35]

    ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based ano- maly detection. ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

  36. [36]

    In: Advances in Speech and Language Technologies for Iberian Languages, pp

    Mart´ ınez, D., Lleida, E., Ortega, A., Miguel, A., Villalba, J.: Voice pathology detection on the saarbr¨ ucken voice database with calibration and fusion of scores using mul- tifocal toolkit. In: Advances in Speech and Language Technologies for Iberian Languages, pp. 99–109. Springer (2012)

  37. [37]

    Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

    Mehta, D.D., Hillman, R.E.: Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

  38. [38]

    In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp

    Mekyska, J., Galaz, Z., Mzourek, Z., Smekal, Z., Rek- torova, I.: Assessing progress of Parkinson’s using acous- tic analysis of phonation. In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp. 115–122 (2015). DOI 10.1109/IWOBI.2015.7160153

  39. [39]

    Neurocomputing 167, 94–111 (2015)

    Mekyska, J., Janousova, E., Gomez-Vilda, P., Smekal, Z., Rektorova, I., Eliasova, I., Kostalova, M., Mrackova, M., Alonso-Hernandez, J.B., Faundez-Zanuy, M., et al.: Ro- bust and complex approach of pathological speech signal analysis. Neurocomputing 167, 94–111 (2015)

  40. [40]

    Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp

    Mekyska, J., Smekal, Z., Galaz, Z., Mzourek, Z., Rek- torova, I., Faundez-Zanuy, M., L´ opez-de Ipi˜ na, K.: Re- cent Advances in Nonlinear Speech Processing, chap. Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp. 83–91. Springer International Publishing, Cham (2016). DOI 10.1007/ 978-3-319-28109-4 9

  41. [41]

    Journal of healthcare engineering 2017 (2017)

    Mesallam, T.A., Farahat, M., Malki, K.H., Alsulaiman, M., Ali, Z., Al-nasheri, A., Muhammad, G.: Development of the arabic voice pathology database and its evalua- tion by using speech features and machine learning algo- rithms. Journal of healthcare engineering 2017 (2017)

  42. [42]

    Acta Acustica united with Acustica 83(4), 700–706 (1997)

    Michaelis, D., Gramss, T., Strube, H.W.: Glottal-to-noise excitation ratio–a new measure for describing patholog- ical voices. Acta Acustica united with Acustica 83(4), 700–706 (1997)

  43. [43]

    Sensors 17(2), 267 (2017)

    Muhammad, G., Alhamid, M.F., Hossain, M.S., Almo- gren, A.S., Vasilakos, A.V.: Enhanced living by assessing Towards Robust Voice Pathology Detection 11 voice pathology using a co-occurrence matrix. Sensors 17(2), 267 (2017)

  44. [44]

    Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

    Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Al-nasheri, A., Bencherif, M.A.: Voice pathology detection using interlaced deriva- tive pattern on glottal source excitation. Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

  45. [45]

    University of British Columbia (2006)

    Murphy, K.P.: Naive bayes classifiers. University of British Columbia (2006)

  46. [46]

    Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

    Oates, J.: Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

  47. [47]

    J Speech Lang

    Parsa, V., Jamieson, D.G.: Identification of pathological voices using glottal noise measures. J Speech Lang. Hear. Res. 23(2), 469–85 (2003)

  48. [48]

    Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour- napeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

  49. [49]

    Signal Processing 99, 215–249 (2014)

    Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Processing 99, 215–249 (2014)

  50. [50]

    Encyclopedia of biometrics pp

    Reynolds, D.: Gaussian mixture models. Encyclopedia of biometrics pp. 827–832 (2015)

  51. [51]

    International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

    Sabir, B., Rouda, F., Khazri, Y., Touri, B., Mousse- tad, M.: Improved algorithm for pathological and nor- mal voices identification. International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

  52. [52]

    Journal of medical imaging and health informatics 4(2), 168–173 (2014)

    Saldanha, J.C., Ananthakrishna, T., Pinto, R.: Vocal fold pathology assessment using mel-frequency cepstral coeffi- cients and linear predictive cepstral coefficients features. Journal of medical imaging and health informatics 4(2), 168–173 (2014)

  53. [53]

    Schalkoff, R.J.: Artificial neural networks, vol. 1. McGraw-Hill New York (1997)

  54. [54]

    In: Principles and Practice of Interventional Pulmonology, pp

    Song, P.: Assessment of vocal cord function and voice disorders. In: Principles and Practice of Interventional Pulmonology, pp. 137–149. Springer (2013)

  55. [55]

    In: Modelling, Identification and Control (ICMIC), 2015 7th International Conference on, pp

    Souissi, N., Cherif, A.: Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector machine. In: Modelling, Identification and Control (ICMIC), 2015 7th International Conference on, pp. 1–6. IEEE (2015)

  56. [56]

    In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp

    Souissi, N., Cherif, A.: Speech recognition system based on short-term cepstral parameters, feature reduction method and artificial neural networks. In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp. 667–671. IEEE (2016)

  57. [57]

    Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

    Stathopoulos, E.T., Huber, J.E., Sussman, J.E.: Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

  58. [58]

    IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

    Teager, H.: Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

  59. [59]

    Englewood Cliffs, N.J (1994)

    Titze, I.R.: Principles of voice production. Englewood Cliffs, N.J (1994)

  60. [60]

    Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Nonlinear speech analysis algorithms mapped to a stan- dard metric achieve clinically useful quantification of av- erage Parkinson’s disease symptom severity. J. R. Soc. Interface 8(59), 842–855 (2010)

  61. [61]

    Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

    Uloza, V., Vegiene, A., Saferis, V.: Correlation be- tween the quantitative video laryngostroboscopic mea- surements and parameters of multidimensional voice as- sessment. Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

  62. [62]

    Woldert-Jokisz, B.: Saarbruecken voice database (2007)

  63. [63]

    Audio Spectrogram Representations for Processing with Convolutional Neural Networks

    Wyse, L.: Audio spectrogram representations for process- ing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017)