Towards Robust Voice Pathology Detection
Pith reviewed 2026-05-24 21:37 UTC · model grok-4.3
The pith
Merging four databases of vowel phonations lets XGBoost reach F1 0.733 for pathology detection using acoustic features and MFCCs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Merging four independent databases of normophonic and pathological sustained /a/ phonations and evaluating gradient boosted trees, deep networks, and anomaly detection produces the following best F1 scores on a held-out test set: 0.733 for XGBoost using acoustic features together with MFCCs, 0.621 for DenseNet using MFCCs, and 0.610 for Isolation Forest using acoustic features.
What carries the argument
The merged corpus of four databases of sustained vowel /a/ recordings, represented by acoustic (dysphonic) features and MFCCs, supplied to XGBoost, DenseNet, and Isolation Forest.
If this is right
- XGBoost with the combination of acoustic features and MFCCs delivers the highest supervised classification performance among the three methods tested.
- DenseNet on MFCCs achieves moderate results that could scale with additional data volume.
- Isolation Forest reaches comparable performance without needing labeled pathological examples in training.
- Pooling multiple databases expands coverage of recording conditions and pathology variety beyond any single prior study.
- The reported scores establish a baseline showing that gradient boosting and deep learning are viable for objective voice pathology screening.
Where Pith is reading between the lines
- Performance on entirely new recording hardware or patient populations outside the four databases would need separate validation to confirm robustness.
- The advantage of combining acoustic features with MFCCs suggests that traditional dysphonic measures add information not captured by spectral coefficients alone.
- Extending the same pipeline to other vowels or running speech could test whether results depend on the /a/ phonation task.
- Integration into clinical workflows would require prospective studies that track whether the classifier output correlates with treatment outcomes.
Load-bearing premise
The four databases can be merged into a single training and test distribution without significant domain shift from differing recording equipment, labeling criteria, or patient demographics, and that the dedicated test set split avoids leakage while remaining representative of unseen pathologies.
What would settle it
Training on three of the databases and testing on the fourth would produce a large drop in F1 score if domain shift between recording conditions prevents generalization.
Figures
read the original abstract
Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to be the first to combine four heterogeneous databases of normophonic and pathological sustained /a/ phonations (unrestricted to specific vocal pathologies) and the first to apply gradient boosted trees (XGBoost) and deep learning (DenseNet) to voice pathology detection. It reports best F1 scores on a dedicated test set of 0.733 (XGBoost with AF+MFCC), 0.621 (DenseNet with MFCC), and 0.610 (Isolation Forest with AF), while framing the work as exploratory.
Significance. If the performance claims hold after addressing validation details, the work would be significant as an early demonstration of gradient boosting and CNNs on a large multi-database corpus for non-invasive voice pathology detection. The combination of four databases is a clear strength if domain shift and leakage are controlled; the exploratory framing and use of modern methods (XGBoost, DenseNet) on acoustic features and MFCCs add value over prior single-database studies.
major comments (2)
- [Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.
- [Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.
minor comments (1)
- [Abstract] Abstract: the phrasing 'conducted experiments do show promising potential' is slightly awkward and could be tightened for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We agree that additional methodological context is warranted and will revise the abstract accordingly while preserving the paper's exploratory framing.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported F1 scores on the dedicated test set (XGBoost 0.733, etc.) are presented without any description of data partitioning across the four databases, hyperparameter selection, statistical significance testing, or error bars; this directly undermines the central empirical claims.
Authors: We acknowledge that the abstract omits key experimental details. The full manuscript specifies a patient-level train/test split across the four databases to avoid speaker leakage, with hyperparameter selection performed via cross-validation on the training portion. Statistical significance testing and error bars were omitted due to the exploratory nature of the study. We will revise the abstract to include a concise description of the patient-level dedicated test set and hyperparameter tuning procedure. revision: yes
-
Referee: [Abstract] Abstract: the claim of robustness via the combined four-database corpus assumes no significant domain shift from differing recording equipment, labeling criteria, or patient demographics, yet no cross-database normalization, domain-adversarial methods, or explicit patient-/database-level split details are provided to support this.
Authors: The manuscript already describes the work as exploratory and does not assert that domain shift has been eliminated. We will revise the abstract to clarify that the four-database combination increases diversity but that no cross-database normalization or domain-adversarial training was applied, and that all splits are performed at the patient level. This will temper the robustness claim without altering the reported results. revision: yes
Circularity Check
No circularity: purely empirical ML results on held-out data
full rationale
The paper reports F1 scores from training XGBoost, DenseNet, and Isolation Forest on AF/MFCC features extracted from four merged voice databases, evaluated on a dedicated test set. No equations, first-principles derivations, fitted parameters relabeled as predictions, or self-citation chains appear in the abstract or described methodology. All performance numbers are direct measurements from standard supervised and anomaly-detection pipelines; the work is self-contained against external benchmarks with no reduction of claims to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp
Al-nasheri, A., Ali, Z., Muhammad, G., Alsulaiman, M.: Voice pathology detection using auto-correlation of differ- ent filters bank. In: Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Confer- ence on, pp. 50–55. IEEE (2014)
work page 2014
-
[2]
Journal of Voice 31(1), 3–15 (2017)
Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z.: Investigation of voice pathology detection and classifica- tion on different frequency regions using correlation func- tions. Journal of Voice 31(1), 3–15 (2017)
work page 2017
-
[3]
IEEE Ac- cess PP(99), 1–1 (2017)
Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Malki, K., Mesallam, T., Farahat, M.: Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Ac- cess PP(99), 1–1 (2017). DOI 10.1109/ACCESS.2017. 2696056
-
[4]
Journal of Voice 31(1), 113–e9 (2017)
Al-nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Bencherif, M.A.: An investigation of multidimensional voice pro- gram parameters in three different databases for voice pathology detection and classification. Journal of Voice 31(1), 113–e9 (2017)
work page 2017
-
[5]
Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra-and inter-database study for arabic, english, and german databases: Do conventional speech features detect voice pathology? Journal of Voice 31(3), 386–e1 (2017)
work page 2017
-
[6]
IEEE Access 5, 3900–3908 (2017)
Ali, Z., Muhammad, G., Alhamid, M.F.: An automatic health monitoring system for patients suffering from voice complications in smart cities. IEEE Access 5, 3900–3908 (2017)
work page 2017
-
[7]
Computers & Electrical Engineering 57, 257–265 (2017)
Amami, R., Smiti, A.: An incremental method combining density clustering and support vector machines for voice pathology detection. Computers & Electrical Engineering 57, 257–265 (2017)
work page 2017
-
[8]
Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)
Arias-Londo˜ no, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modula- tion spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology 36(2), 60–69 (2011)
work page 2011
-
[9]
Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al
Armstrong, D., Gosling, A., Weinman, J., Marteau, T.: The place of inter-rater reliability in qualitative research: an empirical study. Sociology 31(3), 597–606 (1997) 10 Pavol Harar et al
work page 1997
-
[10]
Journal of Neural Transmission 124(3), 303–334 (2017)
Brabenec, L., Mekyska, J., Galaz, Z., Rektorova, I.: Speech disorders in parkinsons disease: early diagnostics and effects of medication and brain stimulation. Journal of Neural Transmission 124(3), 303–334 (2017)
work page 2017
-
[11]
Machine learning 45(1), 5–32 (2001)
Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
work page 2001
-
[12]
Chen, T., Guestrin, C.: Xgboost: A scalable tree boost- ing system. In: Proceedings of the 22nd acm sigkdd in- ternational conference on knowledge discovery and data mining, pp. 785–794. ACM (2016)
work page 2016
-
[13]
Chollet, F., et al.: Keras: Deep learning library for theano and tensorflow. URL: https://keras.io/ (2015)
work page 2015
-
[14]
In: Systems and Control (ICSC), 2017 6th International Conference on, pp
Dahmani, M., Guerti, M.: Vocal folds pathologies clas- sification using na¨ ıve bayes networks. In: Systems and Control (ICSC), 2017 6th International Conference on, pp. 426–432. IEEE (2017)
work page 2017
-
[15]
Journal of Voice 11(1), 74–80 (1997)
De Bodt, M.S., Wuyts, F.L., Van de Heyning, P.H., Croux, C.: Test-retest study of the grbas scale: influence of experience and professional background on perceptual rating of voice quality. Journal of Voice 11(1), 74–80 (1997)
work page 1997
-
[16]
Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Otorhinolaryngol. 258(2), 77–82 (2001)
work page 2001
-
[17]
Computational and mathematical methods in medicine 2015 (2015)
Eskidere, ¨O., G¨ urhanlı, A.: Voice disorder classification based on multitaper mel frequency cepstral coefficients features. Computational and mathematical methods in medicine 2015 (2015)
work page 2015
-
[18]
Eye, M., Infirmary, E.: Voice disorders database, version. 1.03 (cd-rom). Lincoln Park, NJ: Kay Elemetrics Corpo- ration (1994)
work page 1994
-
[19]
Gerratt, B.R., Kreiman, J., Antonanzas-Barroso, N., Berke, G.S.: Comparing internal and external standards in voice quality judgments. J Speech Hear. Res. 36(1), 14–20 (1993)
work page 1993
-
[20]
Journal of Voice 24(6), 667– 677 (2010)
Godino-Llorente, J.I., G´ omez-Vilda, P., Cruz-Rold´ an, F., Blanco-Velasco, M., Fraile, R.: Pathological likelihood in- dex as a measurement of the degree of voice normality and perceived hoarseness. Journal of Voice 24(6), 667– 677 (2010)
work page 2010
-
[21]
Advanced Analytics, LLC (2014)
Gwet, K.L.: Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC (2014)
work page 2014
-
[22]
In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp
Harar, P., Alonso-Hernandezy, J.B., Mekyska, J., Galaz, Z., Burget, R., Smekal, Z.: Voice pathology detection us- ing deep learning: a preliminary study. In: Bioinspired Intelligence (IWOBI), 2017 International Conference and Workshop on, pp. 1–4. IEEE (2017)
work page 2017
-
[23]
Journal of the Royal Statis- tical Society
Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k- means clustering algorithm. Journal of the Royal Statis- tical Society. Series C (Applied Statistics)28(1), 100–108 (1979)
work page 1979
-
[24]
IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)
work page 1998
-
[25]
In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp
Hemmerling, D.: Voice pathology distinction using au- toassociative neural networks. In: Signal Processing Con- ference (EUSIPCO), 2017 25th European, pp. 1844–1847. IEEE (2017)
work page 2017
-
[26]
Computers in biology and medicine 69, 270–276 (2016)
Hemmerling, D., Skalski, A., Gajda, J.: Voice data mining for laryngeal pathology assessment. Computers in biology and medicine 69, 270–276 (2016)
work page 2016
-
[27]
J Speech Hear Res 39(2), 311–21 (1996)
Hillenbrand, J., Houde, R.A.: Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. J Speech Hear Res 39(2), 311–21 (1996)
work page 1996
-
[28]
IEEE Access 4, 7806– 7815 (2016)
Hossain, M.S., Muhammad, G.: Healthcare big data voice pathology assessment framework. IEEE Access 4, 7806– 7815 (2016)
work page 2016
-
[29]
Densely Connected Convolutional Networks
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[30]
Neurocomputing 70(1), 489–501 (2006)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learn- ing machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
work page 2006
-
[31]
Adam: A Method for Stochastic Optimization
Kingma, D., Ba, J.: Adam: A method for stochastic op- timization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[32]
Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., Berke, G.S.: Perceptual evaluation of voice quality: re- view, tutorial, and a framework for future research. J Speech Hear. Res. 36(1), 21–40 (1993)
work page 1993
-
[33]
IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)
Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemon- itoring of Parkinson’s disease. IEEE T Bio-Med Eng 56(4), 1015–1022 (2009)
work page 2009
-
[34]
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pp. 413–422. IEEE (2008)
work page 2008
-
[35]
ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based ano- maly detection. ACM Transactions on Knowledge Dis- covery from Data (TKDD) 6(1), 3 (2012)
work page 2012
-
[36]
In: Advances in Speech and Language Technologies for Iberian Languages, pp
Mart´ ınez, D., Lleida, E., Ortega, A., Miguel, A., Villalba, J.: Voice pathology detection on the saarbr¨ ucken voice database with calibration and fusion of scores using mul- tifocal toolkit. In: Advances in Speech and Language Technologies for Iberian Languages, pp. 99–109. Springer (2012)
work page 2012
-
[37]
Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)
Mehta, D.D., Hillman, R.E.: Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Current opinion in otolaryngology & head and neck surgery 16(3), 211 (2008)
work page 2008
-
[38]
In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp
Mekyska, J., Galaz, Z., Mzourek, Z., Smekal, Z., Rek- torova, I.: Assessing progress of Parkinson’s using acous- tic analysis of phonation. In: 2015 International Work Conference on Bioinspired Intelligence (IWOBI), pp. 115–122 (2015). DOI 10.1109/IWOBI.2015.7160153
-
[39]
Neurocomputing 167, 94–111 (2015)
Mekyska, J., Janousova, E., Gomez-Vilda, P., Smekal, Z., Rektorova, I., Eliasova, I., Kostalova, M., Mrackova, M., Alonso-Hernandez, J.B., Faundez-Zanuy, M., et al.: Ro- bust and complex approach of pathological speech signal analysis. Neurocomputing 167, 94–111 (2015)
work page 2015
-
[40]
Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp
Mekyska, J., Smekal, Z., Galaz, Z., Mzourek, Z., Rek- torova, I., Faundez-Zanuy, M., L´ opez-de Ipi˜ na, K.: Re- cent Advances in Nonlinear Speech Processing, chap. Per- ceptual Features as Markers of Parkinson’s Disease: The Issue of Clinical Interpretability, pp. 83–91. Springer International Publishing, Cham (2016). DOI 10.1007/ 978-3-319-28109-4 9
work page 2016
-
[41]
Journal of healthcare engineering 2017 (2017)
Mesallam, T.A., Farahat, M., Malki, K.H., Alsulaiman, M., Ali, Z., Al-nasheri, A., Muhammad, G.: Development of the arabic voice pathology database and its evalua- tion by using speech features and machine learning algo- rithms. Journal of healthcare engineering 2017 (2017)
work page 2017
-
[42]
Acta Acustica united with Acustica 83(4), 700–706 (1997)
Michaelis, D., Gramss, T., Strube, H.W.: Glottal-to-noise excitation ratio–a new measure for describing patholog- ical voices. Acta Acustica united with Acustica 83(4), 700–706 (1997)
work page 1997
-
[43]
Muhammad, G., Alhamid, M.F., Hossain, M.S., Almo- gren, A.S., Vasilakos, A.V.: Enhanced living by assessing Towards Robust Voice Pathology Detection 11 voice pathology using a co-occurrence matrix. Sensors 17(2), 267 (2017)
work page 2017
-
[44]
Biomedical Sig- nal Processing and Control 31, 156–164 (2017)
Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T.A., Farahat, M., Malki, K.H., Al-nasheri, A., Bencherif, M.A.: Voice pathology detection using interlaced deriva- tive pattern on glottal source excitation. Biomedical Sig- nal Processing and Control 31, 156–164 (2017)
work page 2017
-
[45]
University of British Columbia (2006)
Murphy, K.P.: Naive bayes classifiers. University of British Columbia (2006)
work page 2006
-
[46]
Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)
Oates, J.: Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatrica et Logopaedica 61(1), 49–56 (2009)
work page 2009
-
[47]
Parsa, V., Jamieson, D.G.: Identification of pathological voices using glottal noise measures. J Speech Lang. Hear. Res. 23(2), 469–85 (2003)
work page 2003
-
[48]
Journal of Ma- chine Learning Research 12, 2825–2830 (2011)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour- napeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Ma- chine Learning Research 12, 2825–2830 (2011)
work page 2011
-
[49]
Signal Processing 99, 215–249 (2014)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Processing 99, 215–249 (2014)
work page 2014
-
[50]
Reynolds, D.: Gaussian mixture models. Encyclopedia of biometrics pp. 827–832 (2015)
work page 2015
-
[51]
International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)
Sabir, B., Rouda, F., Khazri, Y., Touri, B., Mousse- tad, M.: Improved algorithm for pathological and nor- mal voices identification. International Journal of Elec- trical and Computer Engineering (IJECE) 7(1), 238–243 (2017)
work page 2017
-
[52]
Journal of medical imaging and health informatics 4(2), 168–173 (2014)
Saldanha, J.C., Ananthakrishna, T., Pinto, R.: Vocal fold pathology assessment using mel-frequency cepstral coeffi- cients and linear predictive cepstral coefficients features. Journal of medical imaging and health informatics 4(2), 168–173 (2014)
work page 2014
-
[53]
Schalkoff, R.J.: Artificial neural networks, vol. 1. McGraw-Hill New York (1997)
work page 1997
-
[54]
In: Principles and Practice of Interventional Pulmonology, pp
Song, P.: Assessment of vocal cord function and voice disorders. In: Principles and Practice of Interventional Pulmonology, pp. 137–149. Springer (2013)
work page 2013
-
[55]
In: Modelling, Identification and Control (ICMIC), 2015 7th International Conference on, pp
Souissi, N., Cherif, A.: Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector machine. In: Modelling, Identification and Control (ICMIC), 2015 7th International Conference on, pp. 1–6. IEEE (2015)
work page 2015
-
[56]
Souissi, N., Cherif, A.: Speech recognition system based on short-term cepstral parameters, feature reduction method and artificial neural networks. In: Advanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International Conference on, pp. 667–671. IEEE (2016)
work page 2016
-
[57]
Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)
Stathopoulos, E.T., Huber, J.E., Sussman, J.E.: Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54(4), 1011– 1021 (2011)
work page 2011
-
[58]
IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)
Teager, H.: Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(5), 599–601 (1980)
work page 1980
-
[59]
Titze, I.R.: Principles of voice production. Englewood Cliffs, N.J (1994)
work page 1994
-
[60]
Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Nonlinear speech analysis algorithms mapped to a stan- dard metric achieve clinically useful quantification of av- erage Parkinson’s disease symptom severity. J. R. Soc. Interface 8(59), 842–855 (2010)
work page 2010
-
[61]
Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)
Uloza, V., Vegiene, A., Saferis, V.: Correlation be- tween the quantitative video laryngostroboscopic mea- surements and parameters of multidimensional voice as- sessment. Biomedical Signal Processing and Control 17(Supplement C), 3–10 (2015)
work page 2015
-
[62]
Woldert-Jokisz, B.: Saarbruecken voice database (2007)
work page 2007
-
[63]
Audio Spectrogram Representations for Processing with Convolutional Neural Networks
Wyse, L.: Audio spectrogram representations for process- ing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.