pith. sign in

arxiv: 1811.03055 · v1 · pith:NYR2XYC3new · submitted 2018-11-07 · 📡 eess.AS · cs.CV· cs.SD

Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

classification 📡 eess.AS cs.CVcs.SD
keywords speakeradversarialtrainingableembeddingsmodeltaskadapting
0
0 comments X
read the original abstract

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging, real-world problem in biometric security. We further the development of end-to-end speaker embedding models by combing a novel 1-dimensional, self-attentive residual network, an angular margin loss function and adversarial training strategy. Our model is able to learn extremely compact, 64-dimensional speaker embeddings that deliver competitive performance on a number of popular datasets using simple cosine distance scoring. One the NIST-SRE 2016 task we are able to beat a strong i-vector baseline, while on the Speakers in the Wild task our model was able to outperform both i-vector and x-vector baselines, showing an absolute improvement of 2.19% over the latter. Additionally, we show that the integration of adversarial training consistently leads to a significant improvement over an unadapted model.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.