DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Chandan K A Reddy; Ross Cutler; Vishak Gopal

arxiv: 2010.15258 · v2 · pith:YMOW6T6Vnew · submitted 2020-10-28 · 💻 cs.SD · cs.LG· eess.AS

DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Chandan K A Reddy , Vishak Gopal , Ross Cutler This is my paper

classification 💻 cs.SD cs.LGeess.AS

keywords evaluatehumanobjectiveperceptualmetricsnoisespeechmetric

0 comments

read the original abstract

Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors. The proposed method generalizes well in challenging test conditions with a high correlation to human ratings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy
cs.CL 2026-05 unverdicted novelty 5.0

SSL speech representations outperform hand-crafted features at lower cognitive hierarchy levels but reverse for MCI classification, with greater response freedom in tasks linked to performance dilution at higher level...
Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents
cs.CL 2026-05 unverdicted novelty 4.0

Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.