Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Jinkun Chen; Ming Li; Weicheng Cai

arxiv: 1804.05160 · v1 · pith:WC43KRJHnew · submitted 2018-04-14 · 📡 eess.AS · cs.LG· cs.SD

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

Weicheng Cai , Jinkun Chen , Ming Li This is my paper

classification 📡 eess.AS cs.LGcs.SD

keywords end-to-endlayerlosssystemencodingspeakerfunctionlanguage

0 comments

read the original abstract

In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system. First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation. Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation. In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. Experimental results on Voxceleb and NIST LRE 07 datasets show that the performance of end-to-end learning system could be significantly improved by the proposed encoding layer and loss function.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable
eess.AS 2026-04 unverdicted novelty 6.0

Speaker recognition networks form hierarchical clusters in latent space that can be matched to semantic classes using new HCCM algorithm and quantified by Liebig's score.
Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors
eess.AS 2019-07 unverdicted novelty 6.0

Digit-specific HMM i-vectors with uncertainty normalization reach 1.52% male and 1.77% female EER on RSR2015 part III using only that corpus and simple cosine scoring.