Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

· 2026 · eess.AS · arXiv 2604.13528

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

In this paper, we introduce GatherMOS, a novel framework that leverages large language models (LLM) as meta-evaluators to aggregate diverse signals into quality predictions. GatherMOS integrates lightweight acoustic descriptors with pseudo-labels from DNSMOS and VQScore, enabling the LLM to reason over heterogeneous inputs and infer perceptual mean opinion scores (MOS). We further explore both zero-shot and few-shot in-context learning setups, showing that zero-shot GatherMOS maintains stable performance across diverse conditions, while few-shot guidance yields large gains when support samples match the test conditions. Experiments on the VoiceBank-DEMAND dataset demonstrate that GatherMOS consistently outperforms DNSMOS, VQScore, naive score averaging, and even learning-based models such as CNN-BLSTM and MOS-SSL when trained under limited labeled-data conditions. These results highlight the potential of LLM-based aggregation as a practical strategy for non-intrusive speech quality evaluation.

representative citing papers

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

eess.AS · 2026-04-15 · unverdicted · novelty 6.0

GatherMOS uses LLMs as meta-evaluators to aggregate acoustic features and pseudo-labels for improved mean opinion score prediction in few-shot speech quality assessment.

citing papers explorer

Showing 1 of 1 citing paper.

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models eess.AS · 2026-04-15 · unverdicted · none · ref 1 · internal anchor
GatherMOS uses LLMs as meta-evaluators to aggregate acoustic features and pseudo-labels for improved mean opinion score prediction in few-shot speech quality assessment.

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer