Hyperbolic Softmax and HAM-Softmax in hyperbolic space reduce equal error rates by 27.84% and 14.23% on average versus standard Softmax and AM-Softmax by modeling hierarchical speaker features.
Musan: A music, speech, and noise corpus
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2verdicts
UNVERDICTED 2representative citing papers
XM-ALIGN improves face-voice association performance by jointly optimizing embeddings from separate encoders with MSE alignment loss and data augmentation on the MAV-Celeb dataset.
citing papers explorer
-
Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification
Hyperbolic Softmax and HAM-Softmax in hyperbolic space reduce equal error rates by 27.84% and 14.23% on average versus standard Softmax and AM-Softmax by modeling hierarchical speaker features.
-
XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association
XM-ALIGN improves face-voice association performance by jointly optimizing embeddings from separate encoders with MSE alignment loss and data augmentation on the MAV-Celeb dataset.