An iterative audio-visual approach for speaker diarisation in real-world meetings that enrolls speaker models via correspondence and outperforms prior methods on the AMI corpus.
V oxceleb: a large- scale speaker identification dataset,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2019 2verdicts
UNVERDICTED 2representative citing papers
Self multi-head attention applied after CNN encoding of spectrograms outperforms temporal and statistical pooling for speaker verification on VoxCeleb1 with 18% relative EER reduction.
citing papers explorer
-
Who said that?: Audio-visual speaker diarisation of real-world meetings
An iterative audio-visual approach for speaker diarisation in real-world meetings that enrolls speaker models via correspondence and outperforms prior methods on the AMI corpus.
-
Self Multi-Head Attention for Speaker Recognition
Self multi-head attention applied after CNN encoding of spectrograms outperforms temporal and statistical pooling for speaker verification on VoxCeleb1 with 18% relative EER reduction.