A Dataset for Automatic Vocal Mode Classification

Alexander Lange; J\"orn Ostermann; Reemt Hinrichs; Sonja Stephan

arxiv: 2601.18339 · v2 · submitted 2026-01-26 · 💻 cs.SD · cs.LG

A Dataset for Automatic Vocal Mode Classification

Reemt Hinrichs , Sonja Stephan , Alexander Lange , J\"orn Ostermann This is my paper

Pith reviewed 2026-05-16 11:11 UTC · model grok-4.3

classification 💻 cs.SD cs.LG

keywords vocal mode classificationComplete Vocal Techniquesinging datasetautomatic classificationResNet18CVT vocal modessinging teachingmachine learning

0 comments

The pith

A new dataset of over 13,000 vocal samples enables automatic classification of four singing modes with 81.3% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a dataset for classifying vocal modes in singing according to the Complete Vocal Technique. The authors recorded sustained vowels across the full vocal range from four singers using multiple microphones and had them annotated by three experienced experts. They provide baseline results showing that a ResNet18 model achieves 81.3% balanced accuracy in 5-fold cross-validation. This resource addresses the previous lack of data for developing technology to assist singing students in learning specific vocal modes. If successful, such classification could support real-time feedback in singing apps or teaching tools.

Core claim

The paper presents a novel dataset consisting of 3,752 unique sustained vowel samples from four singers, augmented to over 13,000 samples via four microphones, with annotations for the CVT vocal modes Neutral, Curbing, Overdrive, and Edge. Baseline classification using deep learning models like ResNet18 yields a best balanced accuracy of 81.3% across 5-fold cross validation, establishing a performance benchmark for future work on automatic vocal mode classification.

What carries the argument

The merged annotations from three CVT-experienced annotators on the multi-microphone recordings of sustained vowels, which serve as the labeled data for training classifiers.

If this is right

Automatic classification of vocal modes can support technology-assisted singing teaching.
The dataset enables development of models for identifying Neutral, Curbing, Overdrive, and Edge modes.
Multi-microphone setup provides natural data augmentation for improved model robustness.
Baseline results set a standard for comparing future classification approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This dataset could be extended to include full songs or dynamic transitions between modes for more realistic applications.
Integration with mobile apps might allow singers to receive instant feedback on their vocal technique.
The annotation process highlights the subjectivity in vocal mode identification, suggesting potential for consensus-based or probabilistic labeling in future datasets.
Cross-singer generalization might be tested by training on some singers and evaluating on others.

Load-bearing premise

The merged annotations from the three experienced annotators accurately and consistently identify the intended vocal modes in the samples without significant disagreement or influence from recording conditions.

What would settle it

A study where independent CVT experts re-annotate a subset of the samples and find substantial disagreement with the provided merged labels, or where classifiers trained on the dataset fail to generalize to new singers or recording setups.

Figures

Figures reproduced from arXiv: 2601.18339 by Alexander Lange, J\"orn Ostermann, Reemt Hinrichs, Sonja Stephan.

**Figure 2.** Figure 2: Number of samples per subject and empirical cumulative distribution of [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Balanced accuracies on the test set across the 5-fold cross validation of all [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Balanced accuracy across half-octaves on the test set for the best iterations [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Fleiss’ kappa score across cut-off note threshold. The computation of the [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

The Complete Vocal Technique (CVT) is a school of singing developed in the past decades by Cathrin Sadolin et al.. CVT groups the use of the voice into so called vocal modes, namely Neutral, Curbing, Overdrive and Edge. Knowledge of the desired vocal mode can be helpful for singing students. Automatic classification of vocal modes can thus be important for technology-assisted singing teaching. Previously, automatic classification of vocal modes has been attempted without major success, potentially due to a lack of data. Therefore, we recorded a novel vocal mode dataset consisting of sustained vowels recorded from four singers, three of which professional singers with more than five years of CVT-experience. The dataset covers the entire vocal range of the subjects, totaling 3,752 unique samples. By using four microphones, thereby offering a natural data augmentation, the dataset consists of more than 13,000 samples combined. An annotation was created using three CVT-experienced annotators, each providing an individual annotation. The merged annotation as well as the three individual annotations come with the published dataset. Additionally, we provide some baseline classification results. The best balanced accuracy across a 5-fold cross validation of 81.3\,\% was achieved with a ResNet18. The dataset can be downloaded under https://zenodo.org/records/14276415.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases a new public dataset for CVT vocal mode classification from sustained vowels, but the 81% baseline is likely inflated by singer leakage in the 5-fold CV.

read the letter

The paper's main contribution is a new dataset of sustained vowels covering the full vocal range from four singers, captured with four microphones and labeled by three CVT-experienced annotators. They release both the merged labels and the individual annotations, along with over 13,000 samples after the mic augmentation, and they report a ResNet18 baseline at 81.3% balanced accuracy under 5-fold cross-validation. That is genuinely new for this narrow task, where prior work had no sizable public corpus to work with. Releasing the data on Zenodo is the useful part here, and the multi-annotator setup plus full-range coverage gives it more value than a quick recording effort would have on its own. The baseline numbers are at least a starting point for anyone who wants to try models on this data. The main weakness is the evaluation protocol. With only four singers and no mention of singer-stratified folds or leave-one-singer-out testing, the cross-validation almost certainly mixes samples from the same singer across train and test sets. A ResNet can then exploit stable singer-specific cues like formant structure or glottal source rather than learning the mode distinctions themselves. The abstract and reported results give no details on how folds were constructed or whether class balance was handled, so the 81% figure does not strongly support claims about general automatic classification. The scope is also narrow—sustained vowels only, four singers—so the dataset will mainly interest people in vocal pedagogy or singing voice analysis rather than broader music AI. A reader who needs labeled examples for mode classification experiments will find it worth downloading and testing. It deserves a serious referee because the data release is concrete and the basic numbers are there, even if the validation needs tightening. I would send it to review and ask for singer-independent results and clearer annotation merging details.

Referee Report

2 major / 2 minor

Summary. The paper introduces a new dataset of sustained vowels from four singers (three professional with CVT experience) for classifying Complete Vocal Technique modes (Neutral, Curbing, Overdrive, Edge). Recordings use four microphones for natural augmentation (>13k samples total), with annotations from three CVT-experienced annotators (individual and merged versions released). Baseline supervised classification reports 81.3% balanced accuracy via 5-fold CV on a ResNet18.

Significance. A publicly released, multi-microphone vocal-mode dataset with expert annotations would address the acknowledged data scarcity in this subfield and support development of tools for singing pedagogy. The multi-annotator design and release of raw annotations are strengths that enable future work on label uncertainty. However, the baseline's evidential value for dataset utility depends on whether the reported accuracy reflects mode discrimination rather than singer identification.

major comments (2)

[Baseline results] Baseline results paragraph: The 5-fold cross-validation protocol is not described as stratified by singer (or using leave-one-singer-out). With only four singers total, folds almost certainly mix samples from the same singer across train and test sets, allowing a ResNet18 to exploit stable singer-specific timbral cues (formant structure, glottal source) rather than CVT mode distinctions. This directly weakens support for the 81.3% figure as evidence of the dataset's utility for general automatic classification.
[Dataset annotation] Dataset annotation subsection: No description is given of how the merged annotation was constructed from the three individual annotations (e.g., majority vote threshold, tie-breaking rule, or exclusion of high-disagreement samples), and no inter-annotator agreement statistics (Cohen's kappa, percentage agreement) are reported. This leaves the ground-truth quality unverifiable and is load-bearing for any downstream classification claims.

minor comments (2)

[Abstract] Abstract: The total sample count after augmentation is stated as 'more than 13,000' but the exact breakdown by mode, singer, and microphone is not summarized, making it harder to assess class balance or coverage of the vocal range.
[Dataset release] The Zenodo link is provided but the manuscript does not list the exact file structure or README contents that accompany the released annotations and raw recordings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the manuscript to strengthen the presentation of the baseline results and annotation process.

read point-by-point responses

Referee: [Baseline results] Baseline results paragraph: The 5-fold cross-validation protocol is not described as stratified by singer (or using leave-one-singer-out). With only four singers total, folds almost certainly mix samples from the same singer across train and test sets, allowing a ResNet18 to exploit stable singer-specific timbral cues (formant structure, glottal source) rather than CVT mode distinctions. This directly weakens support for the 81.3% figure as evidence of the dataset's utility for general automatic classification.

Authors: We agree that the 5-fold CV protocol as described does not isolate singer identity and that, with only four singers, the model could exploit singer-specific cues. To provide stronger evidence of the dataset's utility for mode classification, we will add leave-one-singer-out (LOSO) cross-validation results to the revised manuscript. These will be reported alongside the existing 5-fold results for direct comparison, using the same ResNet18 architecture and balanced accuracy metric. revision: yes
Referee: [Dataset annotation] Dataset annotation subsection: No description is given of how the merged annotation was constructed from the three individual annotations (e.g., majority vote threshold, tie-breaking rule, or exclusion of high-disagreement samples), and no inter-annotator agreement statistics (Cohen's kappa, percentage agreement) are reported. This leaves the ground-truth quality unverifiable and is load-bearing for any downstream classification claims.

Authors: We will expand the Dataset annotation subsection to explicitly describe the merging procedure: a majority-vote rule across the three annotators, with ties resolved by selecting the label from the annotator with the most CVT teaching experience. We will also compute and report inter-annotator agreement using both percentage agreement and Cohen's kappa on the individual annotations. The already-released individual annotations enable users to perform additional uncertainty analyses. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents an empirical dataset of recorded sustained vowels from four singers, with annotations from three CVT-experienced annotators, followed by standard supervised classification baselines (ResNet18 achieving 81.3% balanced accuracy via 5-fold CV). No equations, fitted parameters, or predictions are defined in terms of themselves; the reported accuracy is a direct empirical result on the collected data rather than a quantity forced by construction or self-citation. The work contains no self-citation load-bearing steps, uniqueness theorems, or ansatzes that reduce the central claims to prior author work. The derivation chain is self-contained against external benchmarks (new recordings and standard ML evaluation).

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The contribution is empirical data collection plus standard ML baselines; no new theoretical entities, fitted constants, or ad-hoc axioms are introduced beyond ordinary assumptions of audio classification.

axioms (1)

standard math Standard assumptions of supervised audio classification (i.i.d. samples, consistent labeling, convolutional networks suitable for spectrogram inputs).
Invoked implicitly when applying ResNet18 to the audio data.

pith-pipeline@v0.9.0 · 5543 in / 1220 out tokens · 24519 ms · 2026-05-16T11:11:18.797637+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The best balanced accuracy across a 5-fold cross validation of 81.3% was achieved with a ResNet18.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CVT groups the use of the voice into so called vocal modes, namely Neutral, Curbing, Overdrive and Edge.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Journal of Voice (2021).https: //doi.org/https://doi.org/10.1016/j.jvoice.2021.11.013

Aaen, M., McGlashan, J., Christoph, N., Sadolin, C.: Deconstructing timbre into 5 physiological parameters: vocal mode, amount of metal, degree of density, size of larynx, and sound coloring. Journal of Voice (2021).https: //doi.org/https://doi.org/10.1016/j.jvoice.2021.11.013

work page doi:10.1016/j.jvoice.2021.11.013 2021
[2]

In: Proceedings of the 134th Audio Engineering Society Conven- tion 2013 (05 2013)

Brixen, E., Sadolin, C., Kjelin, H.: Acoustical characteristics of vocal modes in singing. In: Proceedings of the 134th Audio Engineering Society Conven- tion 2013 (05 2013)

work page 2013
[3]

In: Proceedings of the 137th Audio Engineering Society Convention 2014 (10 2014)

Brixen, E., Sadolin, C., Kjelin, H.: The importance of onset features in listeners’ perception of vocal modes in singing. In: Proceedings of the 137th Audio Engineering Society Convention 2014 (10 2014)

work page 2014
[4]

In: Proceedings of the 132nd Audio Engineering Society Convention 2012

Brixen, E.B., Sadolin, C., Kjelin, H.: On acoustic detection of vocal modes. In: Proceedings of the 132nd Audio Engineering Society Convention 2012. Audio Engineering Society (04 2012)

work page 2012
[5]

XGBoost: A Scalable Tree Boosting System

Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Pro- ceedings of the 22nd ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. pp. 785–794. KDD ’16, ACM, New York, NY, USA (2016).https://doi.org/10.1145/2939672.2939785,http: //doi.acm.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[6]

Complete Vocal Institute: Complete Vocal Technique.https: //completevocalinstitute.com/complete-vocal-technique/, ac- cessed: 2026-01-23

work page 2026
[7]

Logopedics Phoniatrics Vocology42(4), 146–152 (2017)

Fantini, M., Fussi, F., Crosetti, E., Succo, G.: Estill voice training and voice quality control in contemporary commercial singing: an exploratory study. Logopedics Phoniatrics Vocology42(4), 146–152 (2017)

work page 2017
[8]

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recog- nition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2016)

work page 2016
[9]

Journal of interdisci- plinary music studies2(1&2), 71–93 (May 2008),https://hal.science/ hal-00297248

Henrich Bernardoni, N., Bezard, P., Expert, R., Garnier, M., Guerin, C., Pillot-Loiseau, C., Quattrocchi, S., Roubeau, B., Terk, B.: Towards a Com- mon Terminology to Describe Voice Quality in Western Lyrical Singing: Contribution of a Multidisciplinary Research Group. Journal of interdisci- plinary music studies2(1&2), 71–93 (May 2008),https://hal.scienc...

work page 2008
[10]

IEEE Signal Processing Magazine36(1), 82–94 (2019).https://doi.org/ 10.1109/MSP.2018.2875133

Humphrey, E.J., Reddy, S., Seetharaman, P., Kumar, A., Bittner, R.M., Demetriou, A., Gulati, S., Jansson, A., Jehan, T., Lehner, B., Krupse, A., Yang, L.: An introduction to signal processing for singing-voice analysis: High notes in the effort to automate the understanding of vocals in music. IEEE Signal Processing Magazine36(1), 82–94 (2019).https://doi...

work page doi:10.1109/msp.2018.2875133 2019
[11]

Journal of Voice35(5), 804–e27 (2021)

Leppävuori, M., Lammentausta, E., Peuna, A., Bode, M.K., Jokelainen, J., Ojala, J., Nieminen, M.T.: Characterizing vocal tract dimensions in the A Dataset for Automatic Vocal Mode Classification 17 vocal modes using magnetic resonance imaging. Journal of Voice35(5), 804–e27 (2021)

work page 2021
[12]

Electronic Theses and Disserta- tions (2011),https://digitalcommons.memphis.edu/etd/376

McClellan, J.W.: A comparative analysis of speech level singing and tradi- tional vocal training in the united states. Electronic Theses and Disserta- tions (2011),https://digitalcommons.memphis.edu/etd/376

work page 2011
[13]

Pilot and Feasibility Studies9(1), 88 (2023)

McGlashan, J., Aaen, M., White, A., Sadolin, C.: A mixed-method feasi- bility study of the use of the complete vocal technique (cvt), a pedagogic method to improve the voice and vocal function in singers and actors, in the treatment of patients with muscle tension dysphonia: a study protocol. Pilot and Feasibility Studies9(1), 88 (2023)

work page 2023
[14]

belting”?: An empirical study qualifying and categorizing “belting

McGlashan, J., Thuesen, M.A., Sadolin, C.: Overdrive and edge as refiners of “belting”?: An empirical study qualifying and categorizing “belting” based on audio perception, laryngostroboscopic imaging, acoustics, ltas, and egg. Journal of Voice31(3), 385.e11–385.e22 (2017).https://doi.org/https: //doi.org/10.1016/j.jvoice.2016.09.006

work page doi:10.1016/j.jvoice.2016.09.006 2017
[15]

In: On the Art of Singing

Miller, R.: The Singing Teacher in the Age of Voice Science. In: On the Art of Singing. Oxford University Press (09 1996).https://doi.org/10.1093/ acprof:osobl/9780195098259.003.0070

work page arXiv 1996
[16]

Bosworth Music (2013)

Sadolin, C.: Complete Vocal Technique. Bosworth Music (2013)

work page 2013
[17]

Logopedics, phoniatrics, vocology pp

Saldías, M., Castro, C., Espinoza Catalán, V., Stoney, J., Quezada, C., Laukkanen, A.M.: Spectral features related to the auditory perception of twang-like voices. Logopedics, phoniatrics, vocology pp. 1–18 (04 2024). https://doi.org/10.1080/14015439.2024.2345373

work page doi:10.1080/14015439.2024.2345373 2024
[18]

Journal of Voice (2023)

Sol, J., Aaen, M., Sadolin, C., Ten Bosch, L.: Towards automated vocal mode classification in healthy singing voice—an xgboost decision tree-based machine learning classifier. Journal of Voice (2023)

work page 2023
[19]

In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference

Wilkins, J., Seetharaman, P., Wahl, A., Pardo, B.: Vocalset: A singing voice dataset. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. pp. 468–474 (2018)

work page 2018
[20]

Circuits, Systems and Signal Processing28(6), 819–843 (Dec 2009).https://doi.org/10

Yoo, J.C., Han, T.H.: Fast normalized cross-correlation. Circuits, Systems and Signal Processing28(6), 819–843 (Dec 2009).https://doi.org/10. 1007/s00034-009-9130-7 18 Reemt Hinrichs, Sonja Stephan, Alexander Lange, and Jörn Ostermann Appendix C2-F2 C#2-F#2 D2-G2 D#2-G#2 E2-A2F2-A#2F#2-B2G2-C3 G#2-C#3 A2-D3 A#2-D#3 B2-E3C3-F3 C#3-F#3 D3-G3 D#3-G#3 E3-A3F3...

work page 2009

[1] [1]

Journal of Voice (2021).https: //doi.org/https://doi.org/10.1016/j.jvoice.2021.11.013

Aaen, M., McGlashan, J., Christoph, N., Sadolin, C.: Deconstructing timbre into 5 physiological parameters: vocal mode, amount of metal, degree of density, size of larynx, and sound coloring. Journal of Voice (2021).https: //doi.org/https://doi.org/10.1016/j.jvoice.2021.11.013

work page doi:10.1016/j.jvoice.2021.11.013 2021

[2] [2]

In: Proceedings of the 134th Audio Engineering Society Conven- tion 2013 (05 2013)

Brixen, E., Sadolin, C., Kjelin, H.: Acoustical characteristics of vocal modes in singing. In: Proceedings of the 134th Audio Engineering Society Conven- tion 2013 (05 2013)

work page 2013

[3] [3]

In: Proceedings of the 137th Audio Engineering Society Convention 2014 (10 2014)

Brixen, E., Sadolin, C., Kjelin, H.: The importance of onset features in listeners’ perception of vocal modes in singing. In: Proceedings of the 137th Audio Engineering Society Convention 2014 (10 2014)

work page 2014

[4] [4]

In: Proceedings of the 132nd Audio Engineering Society Convention 2012

Brixen, E.B., Sadolin, C., Kjelin, H.: On acoustic detection of vocal modes. In: Proceedings of the 132nd Audio Engineering Society Convention 2012. Audio Engineering Society (04 2012)

work page 2012

[5] [5]

XGBoost: A Scalable Tree Boosting System

Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Pro- ceedings of the 22nd ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. pp. 785–794. KDD ’16, ACM, New York, NY, USA (2016).https://doi.org/10.1145/2939672.2939785,http: //doi.acm.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[6] [6]

Complete Vocal Institute: Complete Vocal Technique.https: //completevocalinstitute.com/complete-vocal-technique/, ac- cessed: 2026-01-23

work page 2026

[7] [7]

Logopedics Phoniatrics Vocology42(4), 146–152 (2017)

Fantini, M., Fussi, F., Crosetti, E., Succo, G.: Estill voice training and voice quality control in contemporary commercial singing: an exploratory study. Logopedics Phoniatrics Vocology42(4), 146–152 (2017)

work page 2017

[8] [8]

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recog- nition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2016)

work page 2016

[9] [9]

Journal of interdisci- plinary music studies2(1&2), 71–93 (May 2008),https://hal.science/ hal-00297248

Henrich Bernardoni, N., Bezard, P., Expert, R., Garnier, M., Guerin, C., Pillot-Loiseau, C., Quattrocchi, S., Roubeau, B., Terk, B.: Towards a Com- mon Terminology to Describe Voice Quality in Western Lyrical Singing: Contribution of a Multidisciplinary Research Group. Journal of interdisci- plinary music studies2(1&2), 71–93 (May 2008),https://hal.scienc...

work page 2008

[10] [10]

IEEE Signal Processing Magazine36(1), 82–94 (2019).https://doi.org/ 10.1109/MSP.2018.2875133

Humphrey, E.J., Reddy, S., Seetharaman, P., Kumar, A., Bittner, R.M., Demetriou, A., Gulati, S., Jansson, A., Jehan, T., Lehner, B., Krupse, A., Yang, L.: An introduction to signal processing for singing-voice analysis: High notes in the effort to automate the understanding of vocals in music. IEEE Signal Processing Magazine36(1), 82–94 (2019).https://doi...

work page doi:10.1109/msp.2018.2875133 2019

[11] [11]

Journal of Voice35(5), 804–e27 (2021)

Leppävuori, M., Lammentausta, E., Peuna, A., Bode, M.K., Jokelainen, J., Ojala, J., Nieminen, M.T.: Characterizing vocal tract dimensions in the A Dataset for Automatic Vocal Mode Classification 17 vocal modes using magnetic resonance imaging. Journal of Voice35(5), 804–e27 (2021)

work page 2021

[12] [12]

Electronic Theses and Disserta- tions (2011),https://digitalcommons.memphis.edu/etd/376

McClellan, J.W.: A comparative analysis of speech level singing and tradi- tional vocal training in the united states. Electronic Theses and Disserta- tions (2011),https://digitalcommons.memphis.edu/etd/376

work page 2011

[13] [13]

Pilot and Feasibility Studies9(1), 88 (2023)

McGlashan, J., Aaen, M., White, A., Sadolin, C.: A mixed-method feasi- bility study of the use of the complete vocal technique (cvt), a pedagogic method to improve the voice and vocal function in singers and actors, in the treatment of patients with muscle tension dysphonia: a study protocol. Pilot and Feasibility Studies9(1), 88 (2023)

work page 2023

[14] [14]

belting”?: An empirical study qualifying and categorizing “belting

McGlashan, J., Thuesen, M.A., Sadolin, C.: Overdrive and edge as refiners of “belting”?: An empirical study qualifying and categorizing “belting” based on audio perception, laryngostroboscopic imaging, acoustics, ltas, and egg. Journal of Voice31(3), 385.e11–385.e22 (2017).https://doi.org/https: //doi.org/10.1016/j.jvoice.2016.09.006

work page doi:10.1016/j.jvoice.2016.09.006 2017

[15] [15]

In: On the Art of Singing

Miller, R.: The Singing Teacher in the Age of Voice Science. In: On the Art of Singing. Oxford University Press (09 1996).https://doi.org/10.1093/ acprof:osobl/9780195098259.003.0070

work page arXiv 1996

[16] [16]

Bosworth Music (2013)

Sadolin, C.: Complete Vocal Technique. Bosworth Music (2013)

work page 2013

[17] [17]

Logopedics, phoniatrics, vocology pp

Saldías, M., Castro, C., Espinoza Catalán, V., Stoney, J., Quezada, C., Laukkanen, A.M.: Spectral features related to the auditory perception of twang-like voices. Logopedics, phoniatrics, vocology pp. 1–18 (04 2024). https://doi.org/10.1080/14015439.2024.2345373

work page doi:10.1080/14015439.2024.2345373 2024

[18] [18]

Journal of Voice (2023)

Sol, J., Aaen, M., Sadolin, C., Ten Bosch, L.: Towards automated vocal mode classification in healthy singing voice—an xgboost decision tree-based machine learning classifier. Journal of Voice (2023)

work page 2023

[19] [19]

In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference

Wilkins, J., Seetharaman, P., Wahl, A., Pardo, B.: Vocalset: A singing voice dataset. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. pp. 468–474 (2018)

work page 2018

[20] [20]

Circuits, Systems and Signal Processing28(6), 819–843 (Dec 2009).https://doi.org/10

Yoo, J.C., Han, T.H.: Fast normalized cross-correlation. Circuits, Systems and Signal Processing28(6), 819–843 (Dec 2009).https://doi.org/10. 1007/s00034-009-9130-7 18 Reemt Hinrichs, Sonja Stephan, Alexander Lange, and Jörn Ostermann Appendix C2-F2 C#2-F#2 D2-G2 D#2-G#2 E2-A2F2-A#2F#2-B2G2-C3 G#2-C#3 A2-D3 A#2-D#3 B2-E3C3-F3 C#3-F#3 D3-G3 D#3-G#3 E3-A3F3...

work page 2009