Deep Speaker Feature Learning for Text-independent Speaker Verification

Dong Wang; Lantian Li; Ying Shi; Yixiang Chen; Zhiyuan Tang

arxiv: 1705.03670 · v1 · pith:KCJZU3QMnew · submitted 2017-05-10 · 💻 cs.SD · cs.CL· cs.LG

Deep Speaker Feature Learning for Text-independent Speaker Verification

Lantian Li , Yixiang Chen , Ying Shi , Zhiyuan Tang , Dong Wang This is my paper

classification 💻 cs.SD cs.CLcs.LG

keywords speakerfeaturesdeepfeatureneuralct-dnnjustlearning

0 comments

read the original abstract

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.

This paper has not been read by Pith yet.

Deep Speaker Feature Learning for Text-independent Speaker Verification

discussion (0)