Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

Dong Wang; Lantian Li; Yiye Lin; Zhiyong Zhang

arxiv: 1506.08349 · v1 · pith:BVQZ53LGnew · submitted 2015-06-28 · 💻 cs.CL · cs.LG· cs.NE

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

Lantian Li , Yiye Lin , Zhiyong Zhang , Dong Wang This is my paper

classification 💻 cs.CL cs.LGcs.NE

keywords approachspeakerdeepd-vectorlearningrecognitiontext-dependentbeen

0 comments

read the original abstract

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains when combined with the conventional i-vector approach. Although promising, the existing d-vector implementation still can not compete with the i-vector baseline. This paper presents two improvements for the deep learning approach: a phonedependent DNN structure to normalize phone variation, and a new scoring approach based on dynamic time warping (DTW). Experiments on a text-dependent speaker recognition task demonstrated that the proposed methods can provide considerable performance improvement over the existing d-vector implementation.

This paper has not been read by Pith yet.

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

discussion (0)