Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

Binling Wang; Feng Tong; Jiayang Zhang; Jie Wang; Lin Li; Qingyang Hong; Shipeng Xia; Song Li; Yiming Zhi; Yuji Liu

arxiv: 2209.12002 · v1 · pith:A7IUOSUMnew · submitted 2022-09-24 · 📡 eess.AS · cs.SD

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

Jie Wang , Yuji Liu , Binling Wang , Yiming Zhi , Song Li , Shipeng Xia , Jiayang Zhang , Feng Tong

show 2 more authors

Lin Li Qingyang Hong

This is my paper

classification 📡 eess.AS cs.SD

keywords diarizationmulti-channelspeakerdmsnetsystemarrayasdbbeamforming

0 comments

read the original abstract

This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named discriminative multi-stream neural network (DMSNet) which consists of attention superdirective beamforming (ASDB) block and Conformer encoder. The proposed ASDB is a self-adapted channel-wise block that extracts the latent spatial features of array audios by modeling interdependencies between channels. We explore DMSNet to address overlapped speech problem on multi-channel audio and achieve 93.53% accuracy on evaluation set. By performing DMSNet based overlapped speech detection (OSD) module, the diarization error rate (DER) of cluster-based diarization system decrease significantly from 13.45% to 7.64%.

This paper has not been read by Pith yet.

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

discussion (0)