Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Laurent Girin; Radu Horaud; Simon Leglaive

arxiv: 1811.06713 · v3 · pith:LJNW25PEnew · submitted 2018-11-16 · 💻 cs.SD · eess.AS· stat.ML

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Simon Leglaive , Laurent Girin , Radu Horaud This is my paper

classification 💻 cs.SD eess.ASstat.ML

keywords speechmodelingmultichannelautoencodersenhancementfactorizationframeworkmatrix

0 comments

read the original abstract

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.

This paper has not been read by Pith yet.

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

discussion (0)