UtterIdNet is a DNN that delivers consistent speaker recognition on VoxCeleb for segments down to 250 ms, with reported gains over prior models especially at sub-second lengths.
A Deep Neural Network for Short-Segment Speaker Recognition
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Todays interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the VoxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).
fields
eess.AS 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Deep Neural Network for Short-Segment Speaker Recognition
UtterIdNet is a DNN that delivers consistent speaker recognition on VoxCeleb for segments down to 250 ms, with reported gains over prior models especially at sub-second lengths.