Unified Hypersphere Embedding for Speaker Recognition

Dengxin Dai; Mahdi Hajibabaei

arxiv: 1807.08312 · v1 · pith:5NEV6D7Wnew · submitted 2018-07-22 · 📡 eess.AS · cs.AI· cs.LG· cs.SD

Unified Hypersphere Embedding for Speaker Recognition

Mahdi Hajibabaei , Dengxin Dai This is my paper

classification 📡 eess.AS cs.AIcs.LGcs.SD

keywords modelsverificationaccuracycomplexdatadatasetdeeperembedding

0 comments

read the original abstract

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Results of experiments on VoxCeleb dataset suggest that: (i) Simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18%. (ii) Lower dimensional embeddings are more suitable for verification. (iii) Use of proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracies.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Deep Neural Network for Short-Segment Speaker Recognition
eess.AS 2019-07 unverdicted novelty 5.0

UtterIdNet is a DNN that delivers consistent speaker recognition on VoxCeleb for segments down to 250 ms, with reported gains over prior models especially at sub-second lengths.