Towards Robust Voice Pathology Detection

Jesus B. Alonso-Hernandez; Jiri Mekyska; Pavol Harar; Radim Burget; Zdenek Smekal; Zoltan Galaz

arxiv: 1907.06129 · v1 · pith:S67QZSI6new · submitted 2019-07-13 · 💻 cs.SD · cs.LG· eess.AS

Towards Robust Voice Pathology Detection

Pavol Harar , Zoltan Galaz , Jesus B. Alonso-Hernandez , Jiri Mekyska , Radim Burget , Zdenek Smekal This is my paper

Pith reviewed 2026-05-24 21:37 UTC · model grok-4.3

classification 💻 cs.SD cs.LGeess.AS

keywords voice pathology detectionacoustic featuresMFCCXGBoostDenseNetIsolation Forestsustained phonationvowel /a/

0 comments

The pith

Merging four databases of vowel phonations lets XGBoost reach F1 0.733 for pathology detection using acoustic features and MFCCs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether classifiers from supervised learning and anomaly detection can identify pathological voices from acoustic recordings of the sustained vowel /a/. It combines four separate databases containing both normophonic and pathological samples without limiting the set of pathologies. Experiments evaluate raw waveforms, spectrograms, MFCCs, and conventional acoustic features across XGBoost, DenseNet, and Isolation Forest. The work presents itself as the first to pool these databases and to apply gradient boosting and deep learning in this setting. The highest F1 score on a dedicated test set is 0.733 with XGBoost, indicating that the combined data and chosen methods can support more general detection than prior single-database studies.

Core claim

Merging four independent databases of normophonic and pathological sustained /a/ phonations and evaluating gradient boosted trees, deep networks, and anomaly detection produces the following best F1 scores on a held-out test set: 0.733 for XGBoost using acoustic features together with MFCCs, 0.621 for DenseNet using MFCCs, and 0.610 for Isolation Forest using acoustic features.

What carries the argument

The merged corpus of four databases of sustained vowel /a/ recordings, represented by acoustic (dysphonic) features and MFCCs, supplied to XGBoost, DenseNet, and Isolation Forest.

If this is right

XGBoost with the combination of acoustic features and MFCCs delivers the highest supervised classification performance among the three methods tested.
DenseNet on MFCCs achieves moderate results that could scale with additional data volume.
Isolation Forest reaches comparable performance without needing labeled pathological examples in training.
Pooling multiple databases expands coverage of recording conditions and pathology variety beyond any single prior study.
The reported scores establish a baseline showing that gradient boosting and deep learning are viable for objective voice pathology screening.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Performance on entirely new recording hardware or patient populations outside the four databases would need separate validation to confirm robustness.
The advantage of combining acoustic features with MFCCs suggests that traditional dysphonic measures add information not captured by spectral coefficients alone.
Extending the same pipeline to other vowels or running speech could test whether results depend on the /a/ phonation task.
Integration into clinical workflows would require prospective studies that track whether the classifier output correlates with treatment outcomes.

Load-bearing premise

The four databases can be merged into a single training and test distribution without significant domain shift from differing recording equipment, labeling criteria, or patient demographics, and that the dedicated test set split avoids leakage while remaining representative of unseen pathologies.

What would settle it

Training on three of the databases and testing on the fourth would produce a large drop in F1 score if domain shift between recording conditions prevents generalization.

Figures

Figures reproduced from arXiv: 1907.06129 by Jesus B. Alonso-Hernandez, Jiri Mekyska, Pavol Harar, Radim Burget, Zdenek Smekal, Zoltan Galaz.

**Figure 1.** Figure 1: Visualization of inequality of samples per vocal pathology in the datasets used in this work (only 5 most [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper explores combining four voice databases for pathology detection with XGBoost, DenseNet and Isolation Forest but provides no details on splits or domain handling, so the F1 numbers are hard to trust.

read the letter

The main takeaway is that this work merges four databases of sustained /a/ phonations and tests modern classifiers, reporting best F1 of 0.733 with XGBoost on AF plus MFCC features, 0.621 with DenseNet on MFCC, and 0.610 with Isolation Forest on AF. It positions itself as the first to combine those four unrestricted sets and the first to apply gradient boosting and deep learning here. That claim of novelty looks legitimate for the narrow subfield of medical audio processing. The experiments also cover a sensible range of inputs including raw waveforms, spectrograms, MFCCs and conventional acoustic features, plus both supervised and anomaly detection setups. That breadth is useful as an initial scan of what works on this kind of data. The soft spot is the data-merging step. The abstract gives no information on whether the test set is patient-disjoint or database-disjoint, whether any cross-database normalization was applied, or how domain shift from different equipment and labeling was checked. Without those controls the reported scores could reflect recording artifacts rather than pathology detection. No error bars, significance tests or hyperparameter details are mentioned either. The stress-test concern about unaddressed domain shift therefore lands directly on the central claim. This paper is mainly for people already working on voice pathology screening who need a quick empirical baseline on multi-database setups. A reader could pull the feature and classifier ideas as starting points, but would have to redo the validation themselves. It is coherent enough on its own terms to deserve peer review so that referees can check the actual data-handling section and see whether the numbers survive scrutiny.

Referee Report

2 major / 1 minor

Summary. The paper claims to be the first to combine four heterogeneous databases of normophonic and pathological sustained /a/ phonations (unrestricted to specific vocal pathologies) and the first to apply gradient boosted trees (XGBoost) and deep learning (DenseNet) to voice pathology detection. It reports best F1 scores on a dedicated test set of 0.733 (XGBoost with AF+MFCC), 0.621 (DenseNet with MFCC), and 0.610 (Isolation Forest with AF), while framing the work as exploratory.

Significance. If the performance claims hold after addressing validation details, the work would be significant as an early demonstration of gradient boosting and CNNs on a large multi-database corpus for non-invasive voice pathology detection. The combination of four databases is a clear strength if domain shift and leakage are controlled; the exploratory framing and use of modern methods (XGBoost, DenseNet) on acoustic features and MFCCs add value over prior single-database studies.

major comments (2)

[Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.
[Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.

minor comments (1)

[Abstract] Abstract: the phrasing 'conducted experiments do show promising potential' is slightly awkward and could be tightened for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that additional methodological context is warranted and will revise the abstract accordingly while preserving the paper's exploratory framing.

read point-by-point responses

Referee: [Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.

Authors: We acknowledge that the abstract omits key experimental details. The full manuscript specifies a patient-level train/test split across the four databases to avoid speaker leakage, with hyperparameter selection performed via cross-validation on the training portion. Statistical significance testing and error bars were omitted due to the exploratory nature of the study. We will revise the abstract to include a concise description of the patient-level dedicated test set and hyperparameter tuning procedure. revision: yes
Referee: [Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.

Authors: The manuscript already describes the work as exploratory and does not assert that domain shift has been eliminated. We will revise the abstract to clarify that the four-database combination increases diversity but that no cross-database normalization or domain-adversarial training was applied, and that all splits are performed at the patient level. This will temper the robustness claim without altering the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML results on held-out data

full rationale

The paper reports F1 scores from training XGBoost, DenseNet, and Isolation Forest on AF/MFCC features extracted from four merged voice databases, evaluated on a dedicated test set. No equations, first-principles derivations, fitted parameters relabeled as predictions, or self-citation chains appear in the abstract or described methodology. All performance numbers are direct measurements from standard supervised and anomaly-detection pipelines; the work is self-contained against external benchmarks with no reduction of claims to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical machine learning study on audio classification; no explicit free parameters beyond standard model hyperparameters, no domain axioms, and no invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5774 in / 1242 out tokens · 40185 ms · 2026-05-24T21:37:27.384824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 3 internal anchors

[1]

In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp

Al-nasheri, A., Ali, Z., Muhammad, G., Alsulaiman, M.: Voice pathology detection using auto-correlation of diﬀer- ent ﬁlters bank. In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp. 50–55. IEEE (2014)

work page 2014
[2]

Journal of Voice 31(1), 3–15 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z.: Investigation of voice pathology detection and classiﬁca- tion on diﬀerent frequency regions using correlation func- tions. Journal of Voice 31(1), 3–15 (2017)

work page 2017
[3]

IEEE Ac- cess PP(99), 1–1 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., Farahat, M.: Voice pathology detection and classiﬁcation using auto-correlation and entropy features in diﬀerent frequency regions. IEEE Ac- cess PP(99), 1–1 (2017). DOI 10.1109/ACCESS.2017. 2696056

work page doi:10.1109/access.2017 2017
[4]

Journal of Voice 31(1), 113–e9 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Bencherif, M.A.: An investigation of multidimensional voice pro- gram parameters in three diﬀerent databases for voice pathology detection and classiﬁcation. Journal of Voice 31(1), 113–e9 (2017)

work page 2017
[5]

Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra-and inter-database study for arabic, english, and german databases: Do conventional speech features detect voice pathology? Journal of Voice 31(3), 386–e1 (2017)

work page 2017
[6]

IEEE Access 5, 3900–3908 (2017)

Ali, Z., Muhammad, G., Alhamid, M.F.: An automatic health monitoring system for patients suﬀering from voice complications in smart cities. IEEE Access 5, 3900–3908 (2017)

work page 2017
[7]

Computers & Electrical Engineering 57, 257–265 (2017)

Amami, R., Smiti, A.: An incremental method combining density clustering and support vector machines for voice pathology detection. Computers & Electrical Engineering 57, 257–265 (2017)

work page 2017
[8]

Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

Arias-Londo˜ no, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modula- tion spectra and mel-frequency cepstral coeﬃcients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

work page 2011
[9]

Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

Armstrong, D., Gosling, A., Weinman, J., Marteau, T.: The place of inter-rater reliability in qualitative research: an empirical study. Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

work page 1997
[10]

Journal of Neural Transmission 124(3), 303–334 (2017)

Brabenec, L., Mekyska, J., Galaz, Z., Rektorova, I.: Speech disorders in parkinsons disease: early diagnostics and eﬀects of medication and brain stimulation. Journal of Neural Transmission 124(3), 303–334 (2017)

work page 2017
[11]

Machine learning 45(1), 5–32 (2001)

Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

work page 2001
[12]

In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp

Chen, T., Guestrin, C.: Xgboost: A scalable tree boost- ing system. In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp. 785–794. ACM (2016)

work page 2016
[13]

URL: https://keras.io/ (2015)

Chollet, F., et al.: Keras: Deep learning library for theano and tensorﬂow. URL: https://keras.io/ (2015)

work page 2015
[14]

In: Systems and Control (ICSC), 2017 6th International Conference on, pp

Dahmani, M., Guerti, M.: Vocal folds pathologies clas- siﬁcation using na¨ ıve bayes networks. In: Systems and Control (ICSC), 2017 6th International Conference on, pp. 426–432. IEEE (2017)

work page 2017
[15]

Journal of Voice 11(1), 74–80 (1997)

De Bodt, M.S., Wuyts, F.L., Van de Heyning, P.H., Croux, C.: Test-retest study of the grbas scale: inﬂuence of experience and professional background on perceptual rating of voice quality. Journal of Voice 11(1), 74–80 (1997)

work page 1997
[16]

Eur Arch Otorhinolaryngol

Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the eﬃcacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Otorhinolaryngol. 258(2), 77–82 (2001)

work page 2001
[17]

Computational and mathematical methods in medicine 2015 (2015)

Eskidere, ¨O., G¨ urhanlı, A.: Voice disorder classiﬁcation based on multitaper mel frequency cepstral coeﬃcients features. Computational and mathematical methods in medicine 2015 (2015)

work page 2015
[18]

1.03 (cd-rom)

Eye, M., Inﬁrmary, E.: Voice disorders database, version. 1.03 (cd-rom). Lincoln Park, NJ: Kay Elemetrics Corpo- ration (1994)

work page 1994
[19]

J Speech Hear

Gerratt, B.R., Kreiman, J., Antonanzas-Barroso, N., Berke, G.S.: Comparing internal and external standards in voice quality judgments. J Speech Hear. Res. 36(1), 14–20 (1993)

work page 1993
[20]

Journal of Voice 24(6), 667– 677 (2010)

Godino-Llorente, J.I., G´ omez-Vilda, P., Cruz-Rold´ an, F., Blanco-Velasco, M., Fraile, R.: Pathological likelihood in- dex as a measurement of the degree of voice normality and perceived hoarseness. Journal of Voice 24(6), 667– 677 (2010)

work page 2010
[21]

Advanced Analytics, LLC (2014)

Gwet, K.L.: Handbook of inter-rater reliability: The deﬁnitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC (2014)

work page 2014
[22]

In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp

Harar, P., Alonso-Hernandezy, J.B., Mekyska, J., Galaz, Z., Burget, R., Smekal, Z.: Voice pathology detection us- ing deep learning: a preliminary study. In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp. 1–4. IEEE (2017)

work page 2017
[23]

Journal of the Royal Statis- tical Society

Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k- means clustering algorithm. Journal of the Royal Statis- tical Society. Series C (Applied Statistics)28(1), 100–108 (1979)

work page 1979
[24]

IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

work page 1998
[25]

In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp

Hemmerling, D.: Voice pathology distinction using au- toassociative neural networks. In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp. 1844–1847. IEEE (2017)

work page 2017
[26]

Computers in biology and medicine 69, 270–276 (2016)

Hemmerling, D., Skalski, A., Gajda, J.: Voice data mining for laryngeal pathology assessment. Computers in biology and medicine 69, 270–276 (2016)

work page 2016
[27]

J Speech Hear Res 39(2), 311–21 (1996)

Hillenbrand, J., Houde, R.A.: Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. J Speech Hear Res 39(2), 311–21 (1996)

work page 1996
[28]

IEEE Access 4, 7806– 7815 (2016)

Hossain, M.S., Muhammad, G.: Healthcare big data voice pathology assessment framework. IEEE Access 4, 7806– 7815 (2016)

work page 2016
[29]

Densely Connected Convolutional Networks

Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[30]

Neurocomputing 70(1), 489–501 (2006)

Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learn- ing machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

work page 2006
[31]

Adam: A Method for Stochastic Optimization

Kingma, D., Ba, J.: Adam: A method for stochastic op- timization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

J Speech Hear

Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., Berke, G.S.: Perceptual evaluation of voice quality: re- view, tutorial, and a framework for future research. J Speech Hear. Res. 36(1), 21–40 (1993)

work page 1993
[33]

IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemon- itoring of Parkinson’s disease. IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

work page 2009
[34]

In: Data Mining, 2008

Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pp. 413–422. IEEE (2008)

work page 2008
[35]

ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based ano- maly detection. ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

work page 2012
[36]

In: Advances in Speech and Language Technologies for Iberian Languages, pp

Mart´ ınez, D., Lleida, E., Ortega, A., Miguel, A., Villalba, J.: Voice pathology detection on the saarbr¨ ucken voice database with calibration and fusion of scores using mul- tifocal toolkit. In: Advances in Speech and Language Technologies for Iberian Languages, pp. 99–109. Springer (2012)

work page 2012
[37]

Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

Mehta, D.D., Hillman, R.E.: Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

work page 2008
[38]

In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp

Mekyska, J., Galaz, Z., Mzourek, Z., Smekal, Z., Rek- torova, I.: Assessing progress of Parkinson’s using acous- tic analysis of phonation. In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp. 115–122 (2015). DOI 10.1109/IWOBI.2015.7160153

work page doi:10.1109/iwobi.2015.7160153 2015
[39]

Neurocomputing 167, 94–111 (2015)

Mekyska, J., Janousova, E., Gomez-Vilda, P., Smekal, Z., Rektorova, I., Eliasova, I., Kostalova, M., Mrackova, M., Alonso-Hernandez, J.B., Faundez-Zanuy, M., et al.: Ro- bust and complex approach of pathological speech signal analysis. Neurocomputing 167, 94–111 (2015)

work page 2015
[40]

Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp

Mekyska, J., Smekal, Z., Galaz, Z., Mzourek, Z., Rek- torova, I., Faundez-Zanuy, M., L´ opez-de Ipi˜ na, K.: Re- cent Advances in Nonlinear Speech Processing, chap. Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp. 83–91. Springer International Publishing, Cham (2016). DOI 10.1007/ 978-3-319-28109-4 9

work page 2016
[41]

Journal of healthcare engineering 2017 (2017)

Mesallam, T.A., Farahat, M., Malki, K.H., Alsulaiman, M., Ali, Z., Al-nasheri, A., Muhammad, G.: Development of the arabic voice pathology database and its evalua- tion by using speech features and machine learning algo- rithms. Journal of healthcare engineering 2017 (2017)

work page 2017
[42]

Acta Acustica united with Acustica 83(4), 700–706 (1997)

Michaelis, D., Gramss, T., Strube, H.W.: Glottal-to-noise excitation ratio–a new measure for describing patholog- ical voices. Acta Acustica united with Acustica 83(4), 700–706 (1997)

work page 1997
[43]

Sensors 17(2), 267 (2017)

Muhammad, G., Alhamid, M.F., Hossain, M.S., Almo- gren, A.S., Vasilakos, A.V.: Enhanced living by assessing Towards Robust Voice Pathology Detection 11 voice pathology using a co-occurrence matrix. Sensors 17(2), 267 (2017)

work page 2017
[44]

Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Al-nasheri, A., Bencherif, M.A.: Voice pathology detection using interlaced deriva- tive pattern on glottal source excitation. Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

work page 2017
[45]

University of British Columbia (2006)

Murphy, K.P.: Naive bayes classiﬁers. University of British Columbia (2006)

work page 2006
[46]

Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

Oates, J.: Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

work page 2009
[47]

J Speech Lang

Parsa, V., Jamieson, D.G.: Identiﬁcation of pathological voices using glottal noise measures. J Speech Lang. Hear. Res. 23(2), 469–85 (2003)

work page 2003
[48]

Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour- napeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

work page 2011
[49]

Signal Processing 99, 215–249 (2014)

Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Processing 99, 215–249 (2014)

work page 2014
[50]

Encyclopedia of biometrics pp

Reynolds, D.: Gaussian mixture models. Encyclopedia of biometrics pp. 827–832 (2015)

work page 2015
[51]

International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

Sabir, B., Rouda, F., Khazri, Y., Touri, B., Mousse- tad, M.: Improved algorithm for pathological and nor- mal voices identiﬁcation. International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

work page 2017
[52]

Journal of medical imaging and health informatics 4(2), 168–173 (2014)

Saldanha, J.C., Ananthakrishna, T., Pinto, R.: Vocal fold pathology assessment using mel-frequency cepstral coeﬃ- cients and linear predictive cepstral coeﬃcients features. Journal of medical imaging and health informatics 4(2), 168–173 (2014)

work page 2014
[53]

Schalkoﬀ, R.J.: Artiﬁcial neural networks, vol. 1. McGraw-Hill New York (1997)

work page 1997
[54]

In: Principles and Practice of Interventional Pulmonology, pp

Song, P.: Assessment of vocal cord function and voice disorders. In: Principles and Practice of Interventional Pulmonology, pp. 137–149. Springer (2013)

work page 2013
[55]

In: Modelling, Identiﬁcation and Control (ICMIC), 2015 7th International Conference on, pp

Souissi, N., Cherif, A.: Dimensionality reduction for voice disorders identiﬁcation system based on mel frequency cepstral coeﬃcients and support vector machine. In: Modelling, Identiﬁcation and Control (ICMIC), 2015 7th International Conference on, pp. 1–6. IEEE (2015)

work page 2015
[56]

In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp

Souissi, N., Cherif, A.: Speech recognition system based on short-term cepstral parameters, feature reduction method and artiﬁcial neural networks. In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp. 667–671. IEEE (2016)

work page 2016
[57]

Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

Stathopoulos, E.T., Huber, J.E., Sussman, J.E.: Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

work page 2011
[58]

IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

Teager, H.: Some observations on oral air ﬂow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

work page 1980
[59]

Englewood Cliﬀs, N.J (1994)

Titze, I.R.: Principles of voice production. Englewood Cliﬀs, N.J (1994)

work page 1994
[60]

Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Nonlinear speech analysis algorithms mapped to a stan- dard metric achieve clinically useful quantiﬁcation of av- erage Parkinson’s disease symptom severity. J. R. Soc. Interface 8(59), 842–855 (2010)

work page 2010
[61]

Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

Uloza, V., Vegiene, A., Saferis, V.: Correlation be- tween the quantitative video laryngostroboscopic mea- surements and parameters of multidimensional voice as- sessment. Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

work page 2015
[62]

Woldert-Jokisz, B.: Saarbruecken voice database (2007)

work page 2007
[63]

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

Wyse, L.: Audio spectrogram representations for process- ing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp

Al-nasheri, A., Ali, Z., Muhammad, G., Alsulaiman, M.: Voice pathology detection using auto-correlation of diﬀer- ent ﬁlters bank. In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp. 50–55. IEEE (2014)

work page 2014

[2] [2]

Journal of Voice 31(1), 3–15 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z.: Investigation of voice pathology detection and classiﬁca- tion on diﬀerent frequency regions using correlation func- tions. Journal of Voice 31(1), 3–15 (2017)

work page 2017

[3] [3]

IEEE Ac- cess PP(99), 1–1 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., Farahat, M.: Voice pathology detection and classiﬁcation using auto-correlation and entropy features in diﬀerent frequency regions. IEEE Ac- cess PP(99), 1–1 (2017). DOI 10.1109/ACCESS.2017. 2696056

work page doi:10.1109/access.2017 2017

[4] [4]

Journal of Voice 31(1), 113–e9 (2017)

Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Bencherif, M.A.: An investigation of multidimensional voice pro- gram parameters in three diﬀerent databases for voice pathology detection and classiﬁcation. Journal of Voice 31(1), 113–e9 (2017)

work page 2017

[5] [5]

Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra-and inter-database study for arabic, english, and german databases: Do conventional speech features detect voice pathology? Journal of Voice 31(3), 386–e1 (2017)

work page 2017

[6] [6]

IEEE Access 5, 3900–3908 (2017)

Ali, Z., Muhammad, G., Alhamid, M.F.: An automatic health monitoring system for patients suﬀering from voice complications in smart cities. IEEE Access 5, 3900–3908 (2017)

work page 2017

[7] [7]

Computers & Electrical Engineering 57, 257–265 (2017)

Amami, R., Smiti, A.: An incremental method combining density clustering and support vector machines for voice pathology detection. Computers & Electrical Engineering 57, 257–265 (2017)

work page 2017

[8] [8]

Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

Arias-Londo˜ no, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modula- tion spectra and mel-frequency cepstral coeﬃcients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)

work page 2011

[9] [9]

Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

Armstrong, D., Gosling, A., Weinman, J., Marteau, T.: The place of inter-rater reliability in qualitative research: an empirical study. Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al

work page 1997

[10] [10]

Journal of Neural Transmission 124(3), 303–334 (2017)

Brabenec, L., Mekyska, J., Galaz, Z., Rektorova, I.: Speech disorders in parkinsons disease: early diagnostics and eﬀects of medication and brain stimulation. Journal of Neural Transmission 124(3), 303–334 (2017)

work page 2017

[11] [11]

Machine learning 45(1), 5–32 (2001)

Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)

work page 2001

[12] [12]

In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp

Chen, T., Guestrin, C.: Xgboost: A scalable tree boost- ing system. In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp. 785–794. ACM (2016)

work page 2016

[13] [13]

URL: https://keras.io/ (2015)

Chollet, F., et al.: Keras: Deep learning library for theano and tensorﬂow. URL: https://keras.io/ (2015)

work page 2015

[14] [14]

In: Systems and Control (ICSC), 2017 6th International Conference on, pp

Dahmani, M., Guerti, M.: Vocal folds pathologies clas- siﬁcation using na¨ ıve bayes networks. In: Systems and Control (ICSC), 2017 6th International Conference on, pp. 426–432. IEEE (2017)

work page 2017

[15] [15]

Journal of Voice 11(1), 74–80 (1997)

De Bodt, M.S., Wuyts, F.L., Van de Heyning, P.H., Croux, C.: Test-retest study of the grbas scale: inﬂuence of experience and professional background on perceptual rating of voice quality. Journal of Voice 11(1), 74–80 (1997)

work page 1997

[16] [16]

Eur Arch Otorhinolaryngol

Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the eﬃcacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Otorhinolaryngol. 258(2), 77–82 (2001)

work page 2001

[17] [17]

Computational and mathematical methods in medicine 2015 (2015)

Eskidere, ¨O., G¨ urhanlı, A.: Voice disorder classiﬁcation based on multitaper mel frequency cepstral coeﬃcients features. Computational and mathematical methods in medicine 2015 (2015)

work page 2015

[18] [18]

1.03 (cd-rom)

Eye, M., Inﬁrmary, E.: Voice disorders database, version. 1.03 (cd-rom). Lincoln Park, NJ: Kay Elemetrics Corpo- ration (1994)

work page 1994

[19] [19]

J Speech Hear

Gerratt, B.R., Kreiman, J., Antonanzas-Barroso, N., Berke, G.S.: Comparing internal and external standards in voice quality judgments. J Speech Hear. Res. 36(1), 14–20 (1993)

work page 1993

[20] [20]

Journal of Voice 24(6), 667– 677 (2010)

Godino-Llorente, J.I., G´ omez-Vilda, P., Cruz-Rold´ an, F., Blanco-Velasco, M., Fraile, R.: Pathological likelihood in- dex as a measurement of the degree of voice normality and perceived hoarseness. Journal of Voice 24(6), 667– 677 (2010)

work page 2010

[21] [21]

Advanced Analytics, LLC (2014)

Gwet, K.L.: Handbook of inter-rater reliability: The deﬁnitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC (2014)

work page 2014

[22] [22]

In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp

Harar, P., Alonso-Hernandezy, J.B., Mekyska, J., Galaz, Z., Burget, R., Smekal, Z.: Voice pathology detection us- ing deep learning: a preliminary study. In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp. 1–4. IEEE (2017)

work page 2017

[23] [23]

Journal of the Royal Statis- tical Society

Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k- means clustering algorithm. Journal of the Royal Statis- tical Society. Series C (Applied Statistics)28(1), 100–108 (1979)

work page 1979

[24] [24]

IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)

work page 1998

[25] [25]

In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp

Hemmerling, D.: Voice pathology distinction using au- toassociative neural networks. In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp. 1844–1847. IEEE (2017)

work page 2017

[26] [26]

Computers in biology and medicine 69, 270–276 (2016)

Hemmerling, D., Skalski, A., Gajda, J.: Voice data mining for laryngeal pathology assessment. Computers in biology and medicine 69, 270–276 (2016)

work page 2016

[27] [27]

J Speech Hear Res 39(2), 311–21 (1996)

Hillenbrand, J., Houde, R.A.: Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. J Speech Hear Res 39(2), 311–21 (1996)

work page 1996

[28] [28]

IEEE Access 4, 7806– 7815 (2016)

Hossain, M.S., Muhammad, G.: Healthcare big data voice pathology assessment framework. IEEE Access 4, 7806– 7815 (2016)

work page 2016

[29] [29]

Densely Connected Convolutional Networks

Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [30]

Neurocomputing 70(1), 489–501 (2006)

Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learn- ing machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

work page 2006

[31] [31]

Adam: A Method for Stochastic Optimization

Kingma, D., Ba, J.: Adam: A method for stochastic op- timization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[32] [32]

J Speech Hear

Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., Berke, G.S.: Perceptual evaluation of voice quality: re- view, tutorial, and a framework for future research. J Speech Hear. Res. 36(1), 21–40 (1993)

work page 1993

[33] [33]

IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemon- itoring of Parkinson’s disease. IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)

work page 2009

[34] [34]

In: Data Mining, 2008

Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pp. 413–422. IEEE (2008)

work page 2008

[35] [35]

ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based ano- maly detection. ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)

work page 2012

[36] [36]

In: Advances in Speech and Language Technologies for Iberian Languages, pp

Mart´ ınez, D., Lleida, E., Ortega, A., Miguel, A., Villalba, J.: Voice pathology detection on the saarbr¨ ucken voice database with calibration and fusion of scores using mul- tifocal toolkit. In: Advances in Speech and Language Technologies for Iberian Languages, pp. 99–109. Springer (2012)

work page 2012

[37] [37]

Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

Mehta, D.D., Hillman, R.E.: Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)

work page 2008

[38] [38]

In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp

Mekyska, J., Galaz, Z., Mzourek, Z., Smekal, Z., Rek- torova, I.: Assessing progress of Parkinson’s using acous- tic analysis of phonation. In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp. 115–122 (2015). DOI 10.1109/IWOBI.2015.7160153

work page doi:10.1109/iwobi.2015.7160153 2015

[39] [39]

Neurocomputing 167, 94–111 (2015)

Mekyska, J., Janousova, E., Gomez-Vilda, P., Smekal, Z., Rektorova, I., Eliasova, I., Kostalova, M., Mrackova, M., Alonso-Hernandez, J.B., Faundez-Zanuy, M., et al.: Ro- bust and complex approach of pathological speech signal analysis. Neurocomputing 167, 94–111 (2015)

work page 2015

[40] [40]

Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp

Mekyska, J., Smekal, Z., Galaz, Z., Mzourek, Z., Rek- torova, I., Faundez-Zanuy, M., L´ opez-de Ipi˜ na, K.: Re- cent Advances in Nonlinear Speech Processing, chap. Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp. 83–91. Springer International Publishing, Cham (2016). DOI 10.1007/ 978-3-319-28109-4 9

work page 2016

[41] [41]

Journal of healthcare engineering 2017 (2017)

Mesallam, T.A., Farahat, M., Malki, K.H., Alsulaiman, M., Ali, Z., Al-nasheri, A., Muhammad, G.: Development of the arabic voice pathology database and its evalua- tion by using speech features and machine learning algo- rithms. Journal of healthcare engineering 2017 (2017)

work page 2017

[42] [42]

Acta Acustica united with Acustica 83(4), 700–706 (1997)

Michaelis, D., Gramss, T., Strube, H.W.: Glottal-to-noise excitation ratio–a new measure for describing patholog- ical voices. Acta Acustica united with Acustica 83(4), 700–706 (1997)

work page 1997

[43] [43]

Sensors 17(2), 267 (2017)

Muhammad, G., Alhamid, M.F., Hossain, M.S., Almo- gren, A.S., Vasilakos, A.V.: Enhanced living by assessing Towards Robust Voice Pathology Detection 11 voice pathology using a co-occurrence matrix. Sensors 17(2), 267 (2017)

work page 2017

[44] [44]

Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Al-nasheri, A., Bencherif, M.A.: Voice pathology detection using interlaced deriva- tive pattern on glottal source excitation. Biomedical Sig- nal Processing and Control 31, 156–164 (2017)

work page 2017

[45] [45]

University of British Columbia (2006)

Murphy, K.P.: Naive bayes classiﬁers. University of British Columbia (2006)

work page 2006

[46] [46]

Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

Oates, J.: Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)

work page 2009

[47] [47]

J Speech Lang

Parsa, V., Jamieson, D.G.: Identiﬁcation of pathological voices using glottal noise measures. J Speech Lang. Hear. Res. 23(2), 469–85 (2003)

work page 2003

[48] [48]

Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour- napeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Ma- chine Learning Research 12, 2825–2830 (2011)

work page 2011

[49] [49]

Signal Processing 99, 215–249 (2014)

Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Processing 99, 215–249 (2014)

work page 2014

[50] [50]

Encyclopedia of biometrics pp

Reynolds, D.: Gaussian mixture models. Encyclopedia of biometrics pp. 827–832 (2015)

work page 2015

[51] [51]

International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

Sabir, B., Rouda, F., Khazri, Y., Touri, B., Mousse- tad, M.: Improved algorithm for pathological and nor- mal voices identiﬁcation. International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)

work page 2017

[52] [52]

Journal of medical imaging and health informatics 4(2), 168–173 (2014)

Saldanha, J.C., Ananthakrishna, T., Pinto, R.: Vocal fold pathology assessment using mel-frequency cepstral coeﬃ- cients and linear predictive cepstral coeﬃcients features. Journal of medical imaging and health informatics 4(2), 168–173 (2014)

work page 2014

[53] [53]

Schalkoﬀ, R.J.: Artiﬁcial neural networks, vol. 1. McGraw-Hill New York (1997)

work page 1997

[54] [54]

In: Principles and Practice of Interventional Pulmonology, pp

Song, P.: Assessment of vocal cord function and voice disorders. In: Principles and Practice of Interventional Pulmonology, pp. 137–149. Springer (2013)

work page 2013

[55] [55]

In: Modelling, Identiﬁcation and Control (ICMIC), 2015 7th International Conference on, pp

Souissi, N., Cherif, A.: Dimensionality reduction for voice disorders identiﬁcation system based on mel frequency cepstral coeﬃcients and support vector machine. In: Modelling, Identiﬁcation and Control (ICMIC), 2015 7th International Conference on, pp. 1–6. IEEE (2015)

work page 2015

[56] [56]

In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp

Souissi, N., Cherif, A.: Speech recognition system based on short-term cepstral parameters, feature reduction method and artiﬁcial neural networks. In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp. 667–671. IEEE (2016)

work page 2016

[57] [57]

Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

Stathopoulos, E.T., Huber, J.E., Sussman, J.E.: Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)

work page 2011

[58] [58]

IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

Teager, H.: Some observations on oral air ﬂow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)

work page 1980

[59] [59]

Englewood Cliﬀs, N.J (1994)

Titze, I.R.: Principles of voice production. Englewood Cliﬀs, N.J (1994)

work page 1994

[60] [60]

Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Nonlinear speech analysis algorithms mapped to a stan- dard metric achieve clinically useful quantiﬁcation of av- erage Parkinson’s disease symptom severity. J. R. Soc. Interface 8(59), 842–855 (2010)

work page 2010

[61] [61]

Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

Uloza, V., Vegiene, A., Saferis, V.: Correlation be- tween the quantitative video laryngostroboscopic mea- surements and parameters of multidimensional voice as- sessment. Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)

work page 2015

[62] [62]

Woldert-Jokisz, B.: Saarbruecken voice database (2007)

work page 2007

[63] [63]

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

Wyse, L.: Audio spectrogram representations for process- ing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017