wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

· 2026 · cs.SD · arXiv 2606.28857

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

While automatic tools for speech annotation are now commonplace within phonetic research pipelines, many tasks require substantial manual correction or training sets to perform accurately. Simultaneously, large speech models such as wav2vec2 have been shown to perform well at speech classification tasks, raising the question of how these models may be applied to phonetic annotation tasks. We introduce wav2VOT: a tool for the automatic estimation of voice onset time, closure duration, and burst realisation using wav2vec2. We demonstrate that wav2VOT performs comparably with current approaches on unseen datasets, and can estimate with high accuracy with fine-tuning. Analysis of wav2VOT predictions demonstrate high fidelity across stop voicing and place of articulation. These results demonstrate that large speech models are capable of producing accurate annotations, and further motivate exploration of large speech models as tools in phonetic research pipelines.

representative citing papers

wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

cs.SD · 2026-06-27 · unverdicted · novelty 5.0

wav2VOT shows wav2vec2 can estimate voice onset time and related stop consonant features with accuracy comparable to existing tools on unseen data and higher accuracy after fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper after filters.

wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2 cs.SD · 2026-06-27 · unverdicted · none · ref 2 · internal anchor
wav2VOT shows wav2vec2 can estimate voice onset time and related stop consonant features with accuracy comparable to existing tools on unseen data and higher accuracy after fine-tuning.

wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

fields

years

verdicts

representative citing papers

citing papers explorer