L-Proto improves multilingual speaker verification by sampling single-language episodes during episodic prototypical training, yielding gains on the TidyVoice benchmark across backbones.
L-Proto: Language-Aware Episodic Prototypical Training for Multilingual Speaker Verification
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Multilingual speaker verification remains challenging because language-dependent acoustic variability causes speaker identity to become entangled with linguistic characteristics, degrading generalization across languages. In multilingual training, embeddings often encode language cues with speaker identity, causing speakers to form language-specific clusters. We propose L-Proto, a language-aware episodic prototypical training strategy that constructs language-consistent episodes. By sampling speakers from a single language per episode, L-Proto reduces language-driven variation during training and encourages embeddings to focus more directly on speaker identity. Experiments on the TidyVoice Challenge benchmark demonstrate consistent performance improvements over conventional fine-tuning and random episodic sampling across multiple backbone architectures.
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
L-Proto: Language-Aware Episodic Prototypical Training for Multilingual Speaker Verification
L-Proto improves multilingual speaker verification by sampling single-language episodes during episodic prototypical training, yielding gains on the TidyVoice benchmark across backbones.