arxiv: 1808.05344 · v2 · pith:QXS724YCnew · submitted 2018-08-16 · 💻 cs.SD · cs.AI· eess.AS

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM

Szu-Wei Fu , Yu Tsao , Hsin-Te Hwang , Hsin-Min Wang This is my paper

classification 💻 cs.SD cs.AIeess.AS

keywords speechqualityassessmentquality-netreferencecleanevaluationnon-intrusive

0 comments

read the original abstract

Nowadays, most of the objective speech quality assessment tools (e.g., perceptual evaluation of speech quality (PESQ)) are based on the comparison of the degraded/processed speech with its clean counterpart. The need of a "golden" reference considerably restricts the practicality of such assessment tools in real-world scenarios since the clean reference usually cannot be accessed. On the other hand, human beings can readily evaluate the speech quality without any reference (e.g., mean opinion score (MOS) tests), implying the existence of an objective and non-intrusive (no clean reference needed) quality assessment mechanism. In this study, we propose a novel end-to-end, non-intrusive speech quality evaluation model, termed Quality-Net, based on bidirectional long short-term memory. The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment. Frame constraints and sensible initializations of forget gate biases are applied to learn meaningful frame-level quality assessment from the utterance-level quality label. Experimental results show that Quality-Net can yield high correlation to PESQ (0.9 for the noisy speech and 0.84 for the speech processed by speech enhancement). We believe that Quality-Net has potential to be used in a wide variety of applications of speech signal processing.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
cs.SD 2025-02 unverdicted novelty 6.0

Unified no-reference models assess audio aesthetics across speech, music, and sound via four perceptual axes and achieve performance comparable or superior to human mean opinion scores.