Interspeech 2021 Deep Noise Suppression Challenge

Arun Nair; Chandan K A Reddy; Hannes Gamper; Harishchandra Dubey; Kazuhito Koishida; Robert Aichner; Ross Cutler; Sebastian Braun; Sriram Srinivasan; Vishak Gopal

arxiv: 2101.01902 · v3 · pith:LVQNCKTMnew · submitted 2021-01-06 · 💻 cs.SD · cs.LG· eess.AS

Interspeech 2021 Deep Noise Suppression Challenge

Chandan K A Reddy , Harishchandra Dubey , Kazuhito Koishida , Arun Nair , Vishak Gopal , Ross Cutler , Sebastian Braun , Hannes Gamper

show 2 more authors

Robert Aichner Sriram Srinivasan

This is my paper

classification 💻 cs.SD cs.LGeess.AS

keywords challengenoisebandinterspeechqualityscenariosspeechsuppression

0 comments

read the original abstract

The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was also used to evaluate participants of the challenge. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and(ii) full band scenarios. We are also making available a reliable non-intrusive objective speech quality metric called DNSMOS for the participants to use during their development phase.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DASH: Dual-View Self-Distillation with Multi-Layer Hidden Representations for Robust Speech Recognition
eess.AS 2026-06 unverdicted novelty 4.0

DASH applies dual-view self-distillation on multi-layer representations and prototype distributions to boost ASR noise robustness while keeping clean accuracy.
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey
cs.CV 2026-04 unverdicted novelty 4.0

The paper offers the first focused review of MLLM-based video translation organized by a three-role taxonomy of Semantic Reasoner, Expressive Performer, and Visual Synthesizer, plus open challenges.