wav2VOT shows wav2vec2 can estimate voice onset time and related stop consonant features with accuracy comparable to existing tools on unseen data and higher accuracy after fine-tuning.
wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
While automatic tools for speech annotation are now commonplace within phonetic research pipelines, many tasks require substantial manual correction or training sets to perform accurately. Simultaneously, large speech models such as wav2vec2 have been shown to perform well at speech classification tasks, raising the question of how these models may be applied to phonetic annotation tasks. We introduce wav2VOT: a tool for the automatic estimation of voice onset time, closure duration, and burst realisation using wav2vec2. We demonstrate that wav2VOT performs comparably with current approaches on unseen datasets, and can estimate with high accuracy with fine-tuning. Analysis of wav2VOT predictions demonstrate high fidelity across stop voicing and place of articulation. These results demonstrate that large speech models are capable of producing accurate annotations, and further motivate exploration of large speech models as tools in phonetic research pipelines.
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2
wav2VOT shows wav2vec2 can estimate voice onset time and related stop consonant features with accuracy comparable to existing tools on unseen data and higher accuracy after fine-tuning.