Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

· 2026 · eess.AS · arXiv 2603.15988

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

cs.CL · 2026-04-11 · unverdicted · novelty 7.0

A training-free method quantifies dysarthria severity via d-prime scores on phonological contrasts in HuBERT embeddings, correlating with clinical ratings across 5 languages and multiple conditions.

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

eess.AS · 2026-03-16 · unverdicted · novelty 6.0

A three-stage pseudo-labeling and contrastive learning framework achieves average SRCC of 0.761 on five unseen dysarthric speech datasets for robust severity estimation.

Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

cs.CL · 2026-04-23 · unverdicted · novelty 4.0

Phonological subspace collapse in SSL speech representations produces aetiology-specific degradation profiles that remain stable in shape across languages and model architectures.

citing papers explorer

Showing 3 of 3 citing papers.

Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations cs.CL · 2026-04-11 · unverdicted · none · ref 15 · internal anchor
A training-free method quantifies dysarthria severity via d-prime scores on phonological contrasts in HuBERT embeddings, correlating with clinical ratings across 5 languages and multiple conditions.
Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech eess.AS · 2026-03-16 · unverdicted · none · ref 1 · internal anchor
A three-stage pseudo-labeling and contrastive learning framework achieves average SRCC of 0.761 on five unseen dysarthric speech datasets for robust severity estimation.
Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers cs.CL · 2026-04-23 · unverdicted · none · ref 14 · internal anchor
Phonological subspace collapse in SSL speech representations produces aetiology-specific degradation profiles that remain stable in shape across languages and model architectures.

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer