Using recurrences in time and frequency within U-net architecture for speech enhancement

Szymon Drgas; Tomasz Grzywalski

arxiv: 1811.06805 · v1 · pith:SEZBI3N6new · submitted 2018-11-16 · 💻 cs.LG · cs.SD· eess.AS· stat.ML

Using recurrences in time and frequency within U-net architecture for speech enhancement

Tomasz Grzywalski , Szymon Drgas This is my paper

classification 💻 cs.LG cs.SDeess.ASstat.ML

keywords networksolutionspeechenhancementlayersmodelsratioadvantage

0 comments

read the original abstract

When designing fully-convolutional neural network, there is a trade-off between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. In this work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with U-nets based models known from the literature and other baseline models on speech enhancement task. We test our solution on TIMIT speech utterances combined with noise segments extracted from NOISEX-92 database and show clear advantage of proposed solution in terms of SDR (signal-to-distortion ratio), SIR (signal-to-interference ratio) and STOI (spectro-temporal objective intelligibility) metrics compared to the current state-of-the-art.

This paper has not been read by Pith yet.

Using recurrences in time and frequency within U-net architecture for speech enhancement

discussion (0)