Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

arxiv: 2604.10123 · v1 · submitted 2026-04-11 · 💻 cs.CL · cs.LG

Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

Bernard Muller , Antonio Armando Ortiz Barra\~n\'on , LaVonne Roberts This is my paper

Pith reviewed 2026-05-10 16:11 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords dysarthria severityphonological featuresself-supervised speechtraining-free assessmentcross-linguald-prime scoresforced alignmentHuBERT embeddings

0 comments p. Extension

The pith

Dysarthria severity can be measured from degradation along phonological contrast directions in frozen speech representations, with directions defined only from healthy control speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a training-free approach to quantify dysarthria severity by tracking how specific phonological features degrade in pretrained speech embeddings. Feature directions for contrasts such as nasality, voicing, and manner are estimated exclusively from healthy speakers using a forced aligner, then applied to patient embeddings to compute d-prime scores that form a 12-dimensional profile. These scores show consistent negative correlations with clinical severity ratings across 890 speakers, five languages, and three aetiologies, with effects that hold after multiple robustness checks. A sympathetic reader would care because the method needs no labelled dysarthric data and works for any language that already has an acoustic alignment model, removing a major barrier to scalable clinical assessment.

Core claim

By extracting phone-level embeddings from frozen HuBERT representations and computing d-prime scores along phonological feature directions estimated solely from healthy control speech, the resulting 12-dimensional profiles correlate significantly with clinical severity (random-effects meta-analysis rho from -0.50 to -0.56), with all five consonant features surviving multiple testing corrections and remaining stable under leave-one-corpus-out validation.

What carries the argument

D-prime scores computed along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) that are derived exclusively from healthy control speech within frozen HuBERT embeddings; these scores quantify per-speaker degradation relative to the healthy reference subspace.

If this is right

The same healthy-control directions produce reliable severity correlations in each of the five tested languages without any language-specific retraining.
Nasality d-prime decreases monotonically with increasing severity in six of the seven severity-graded corpora.
All twelve phonological features separate healthy controls from severely dysarthric speakers at p less than 0.001.
The pipeline can be deployed for any of the 29 languages that already have a Montreal Forced Aligner model.
No dysarthric speech data is required to build or adapt the severity estimator for a new clinical setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stability of the correlations under leave-one-corpus-out removal suggests the phonological degradation signal is not tied to any single recording condition or disease subtype.
Because the method separates controls from severe cases across aetiologies, it could serve as an initial screening layer before more detailed clinical evaluation.
Extending the set of phonological directions or testing other frozen self-supervised models might strengthen the observed correlations without introducing supervised training.
The requirement for only an existing aligner model implies the approach could transfer quickly to additional languages once their acoustic models become available.

Load-bearing premise

Phonological feature directions estimated only from healthy control speech using a pretrained forced aligner still capture the main degradation patterns that occur in dysarthric speech across languages and disease types.

What would settle it

Observing no significant correlation between the d-prime phonological scores and independent clinical severity ratings in a new corpus from an additional language or aetiology, after applying the same healthy-control direction estimation, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.10123 by Antonio Armando Ortiz Barra\~n\'on, Bernard Muller, LaVonne Roberts.

**Figure 1.** Figure 1: Phonological subspace collapse in dysarthric speech. UMAP projection of phone-level HuBERT embeddings for one healthy control speaker (MC01, left) and one speaker with severe dysarthria (M01, right) from the TORGO corpus. Points represent individual phone tokens coloured by class: nasal consonants (m, n, ng) vs. oral stops (p, b, t, d, k, g). In the healthy speaker, the two classes form tight, well-separa… view at source ↗

**Figure 2.** Figure 2: Clinical validation: d ′ correlates with clinician-rated intelligibility across languages. Left: stridency d ′ vs intelligibility (Kendall’s τ = 0.407, p = 2.9 × 10−6 , n = 64). Right: mean consonant d ′ vs intelligibility (τ = 0.323, p = 3.9 × 10−5 , n = 86). Data from UA-Speech (English, cerebral palsy) and COPAS (Dutch, mixed aetiologies). Trendlines show positive linear association. We test this hypoth… view at source ↗

**Figure 3.** Figure 3: Left panel: group means for all reported features across control/mild/moderate/severe stages ( [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Consonant d ′ degradation by severity across corpora. Each panel displays one of the five consonant d ′ features (nasality, sonorant, voicing, strident, manner) with speaker-level means plotted across severity groups (control, mild, moderate, severe) for each corpus independently. Lines connect severity group means within each corpus. All five features show a consistent downward trend from control to sever… view at source ↗

**Figure 5.** Figure 5: Severity correlation forest plot for all 12 phonological features. Spearman ρ between each feature and ordinal severity (control = 0, mild = 1, moderate = 2, severe = 3), pooled across all corpora. Horizontal bars show bootstrap 95% confidence intervals (1,000 iterations). Features are sorted by effect size. Dark navy: consonant d ′ features; light blue: vowel d ′ features; orange: structural metrics. All … view at source ↗

**Figure 6.** Figure 6: Consonant d ′ profiles across 867 speakers sorted by severity. Each column is one speaker; each row is one consonant d ′ feature. Speakers are sorted left to right by severity (control, mild, moderate, severe), then by mean d ′ within each group. Top strips show severity (green to red) and aetiology (colour-coded). The green-to-red gradient across all five features demonstrates that phonological subspace d… view at source ↗

**Figure 7.** Figure 7: Phonological fingerprints by aetiology. Radar plots showing the mean consonant d ′ profile for cerebral palsy (CP, n = 86), Parkinson’s disease (PD, n = 211), and amyotrophic lateral sclerosis (ALS, n = 50) speakers. Each axis represents one consonant d ′ feature; the green shaded area shows the healthy control reference profile. CP shows the most uniform degradation across all features, while PD and ALS e… view at source ↗

**Figure 8.** Figure 8: visualises this relationship. This occurs because d ′ estimation is biased upward with more observations – the sample means more closely approximate the true population means when n is large, reducing the pooled standard deviation relative to the mean difference. The practical consequence is that absolute d ′ values are not comparable across corpora with different amounts of speech per speaker. For example… view at source ↗

**Figure 9.** Figure 9: ROC curves for binary severity detection. Left: severe vs. rest (stridency d ′ AUC = 0.890). Right: moderate-or-worse vs. mild/control. All five consonant d ′ features and their mean are shown. Optimal thresholds determined by Youden’s J statistic. 5.9 Robustness and Sensitivity Analyses Seven additional analyses address the main potential confounds: token count, statistical multiplicity, corpus sensitivi… view at source ↗

**Figure 10.** Figure 10: Random-effects meta-analysis forest plot, k = 8 corpora. I 2 = 87–92% indicates high between-corpus heterogeneity in magnitude but consistent direction. All five consonant d ′ features are significant under both DerSimonian-Laird and the more conservative Hartung-Knapp-Sidik-Jonkman estimator (all HKSJ p < 0.013). All five consonant features are significant under both DL and the more conservative HKSJ inf… view at source ↗

read the original abstract

Dysarthric speech severity assessment typically requires trained clinicians or supervised models built from labelled pathological speech, limiting scalability across languages and clinical settings. We present a training-free method that quantifies dysarthria severity by measuring degradation in phonological feature subspaces within frozen HuBERT representations. No supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d-prime scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy controls, and construct a 12-dimensional phonological profile.Evaluating 890 speakers across 10 corpora, 5 languages (English, Spanish, Dutch, Mandarin, French), and 3 primary aetiologies (Parkinson's disease, cerebral palsy, ALS), we find that all five consonant d-prime features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2e-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero). The effect replicates within individual corpora, survives FDR correction, and remains robust to leave-one-corpus-out removal and alignment quality controls. Nasality d-prime decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages). We release the full pipeline and phone feature configurations for six languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows you can get decent correlations with dysarthria severity by projecting patient speech onto phonological directions learned only from healthy controls in frozen HuBERT, with no patient training data at all.

read the letter

The main result is that five consonant d-prime features derived this way correlate with clinical severity at rho values around -0.5 in a meta-analysis across 890 speakers, five languages, and three aetiologies. The effect holds inside individual corpora, survives FDR and leave-one-corpus-out checks, and they release the full pipeline plus feature configs for six languages. That combination of scale and zero training on labeled patient data is the practical contribution worth noting.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a training-free method for cross-lingual dysarthria severity assessment that quantifies degradation in phonological feature subspaces within frozen HuBERT representations. Phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) are estimated exclusively from healthy control speech via the Montreal Forced Aligner; d-prime scores along these fixed directions are then computed per speaker to yield a 12-dimensional phonological profile. On 890 speakers across 10 corpora, 5 languages, and 3 aetiologies, all five consonant d-prime features show significant negative correlations with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56; pooled Spearman rho = -0.47 to -0.55), with replication within corpora, survival of FDR correction, and robustness to leave-one-corpus-out and alignment-quality controls. The pipeline requires no dysarthric training data and is released for six languages.

Significance. If the geometric assumption holds, the approach offers a scalable, language-agnostic alternative to supervised models or clinician ratings by eliminating the need for labeled pathological speech. The explicit release of code, phone-feature configurations, and the training-free design are concrete strengths that enhance reproducibility and potential clinical adoption across the 29 languages supported by MFA.

major comments (2)

[Methods (phonological direction estimation and d-prime computation)] The load-bearing assumption that directions derived solely from healthy-control embeddings coincide with the primary axes of phonological degradation in dysarthric speech is not directly tested. No comparison of control-derived directions versus directions estimated from patient embeddings, nor any analysis of embedding-geometry shifts (e.g., subspace overlap or principal-component divergence between control and patient sets), is reported. Existing checks (FDR, leave-one-corpus-out, alignment quality) do not address this concern, leaving the reported rho values vulnerable to the possibility that dysarthria collapses distinctions along orthogonal axes.
[Feature extraction pipeline and robustness checks] Details on how the pretrained MFA aligner performs on dysarthric speech and any quantitative mitigation of alignment errors are insufficient. Dysarthric speech commonly produces higher alignment error rates; without reported alignment accuracy metrics per severity level or sensitivity analyses showing that d-prime scores remain stable under realistic misalignment, the feature-extraction pipeline risks systematic confounds that could inflate or deflate the observed correlations.

minor comments (2)

[Abstract] The abstract states a 12-dimensional profile but enumerates only nine directions (five consonant + four vowel); clarify whether additional features, combinations, or vowel-specific contrasts are included and list them explicitly.
[Results] Provide a supplementary table summarizing per-corpus speaker counts, severity distributions, and exact clinical rating scales to allow readers to assess heterogeneity in the meta-analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major concern below, providing clarifications and indicating revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods (phonological direction estimation and d-prime computation)] The load-bearing assumption that directions derived solely from healthy-control embeddings coincide with the primary axes of phonological degradation in dysarthric speech is not directly tested. No comparison of control-derived directions versus directions estimated from patient embeddings, nor any analysis of embedding-geometry shifts (e.g., subspace overlap or principal-component divergence between control and patient sets), is reported. Existing checks (FDR, leave-one-corpus-out, alignment quality) do not address this concern, leaving the reported rho values vulnerable to the possibility that dysarthria collapses distinctions along orthogonal axes.

Authors: We agree this is a substantive point and that a direct test would further validate the geometric assumption. The original manuscript relied on the observed correlations with clinical ratings (which replicate across languages, aetiologies, and corpora) as indirect evidence that the control-derived directions capture relevant degradation. To address the concern directly, the revised manuscript now includes a comparison of control-derived directions against directions estimated from dysarthric embeddings, along with subspace overlap and principal-component divergence metrics between control and patient sets. These new analyses are reported in an expanded Methods section and confirm that degradation occurs primarily along the control-derived axes rather than orthogonal ones. revision: yes
Referee: [Feature extraction pipeline and robustness checks] Details on how the pretrained MFA aligner performs on dysarthric speech and any quantitative mitigation of alignment errors are insufficient. Dysarthric speech commonly produces higher alignment error rates; without reported alignment accuracy metrics per severity level or sensitivity analyses showing that d-prime scores remain stable under realistic misalignment, the feature-extraction pipeline risks systematic confounds that could inflate or deflate the observed correlations.

Authors: We acknowledge that the original description of alignment robustness was brief. The manuscript already referenced alignment quality controls, but we have expanded this in the revision by adding quantitative alignment accuracy metrics (boundary error and phone-level accuracy) for dysarthric speech, now stratified by severity level using available corpus annotations. We have also included sensitivity analyses that simulate realistic misalignment rates and demonstrate stability of the d-prime scores and their severity correlations. These additions are incorporated into the Methods and Results sections to mitigate concerns about systematic confounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation is self-contained

full rationale

The paper computes phonological feature directions exclusively from healthy-control embeddings via a pretrained MFA aligner, then calculates d-prime scores as discriminability along those fixed directions on patient speech. These scores are correlated post-hoc with clinical severity labels; no parameters are fitted to severity data, and the directions are independent of patient labels by construction. The central result is an empirical statistical association (meta-analysis rho values), not a prediction forced by redefinition or self-citation. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. The approach is explicitly training-free and applies control-derived subspaces without circular reduction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The approach relies on standard pretrained models and phonological knowledge rather than introducing new entities or fitting many parameters. The d-prime calculation uses standard statistical methods.

axioms (3)

domain assumption Pretrained HuBERT model encodes phonological information in its representations
The method uses subspaces in these representations to measure degradation.
domain assumption Montreal Forced Aligner can accurately segment dysarthric speech into phones
Used to extract phone-level embeddings from patients.
domain assumption The selected phonological contrasts (nasality, voicing, etc.) are relevant to dysarthria severity
Based on literature but assumed to capture the degradation.

pith-pipeline@v0.9.0 · 5642 in / 1463 out tokens · 73500 ms · 2026-05-10T16:11:55.795433+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers
cs.CL 2026-04 unverdicted novelty 4.0

Phonological subspace collapse in SSL speech representations produces aetiology-specific degradation profiles that remain stable in shape across languages and model architectures.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Does speech and language therapy work? A review of the literature

Enderby P, Emerson J. Does speech and language therapy work? A review of the literature. London: Whurr Publishers; 1995

work page 1995
[2]

Severity-aware learn- ing with triplet loss for dysarthric speech classification

Kadirvelu B, Ganapathy S, Sinha S, Ning L, Ding L, Joshi D, et al. Severity-aware learn- ing with triplet loss for dysarthric speech classification. PLOS Digit Health. 2025;4(11): e0001076

work page 2025
[3]

Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling

Yeo E, Liss JM, Berisha V, Mortensen DR. Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling. arXiv preprint arXiv:2601.21205. 2026

work page arXiv 2026
[4]

J., Mortensen, D

Choi Y, Lee S, Kim J. Self-supervised speech models encode phonetic context via position- dependent orthogonal subspaces. arXiv preprint arXiv:2603.12642. 2026

work page arXiv 2026
[5]

HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process

Hsu WN, Bolte B, Tsai YHH, Lakhotia K, Salakhutdinov R, Mohamed A. HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process. 2021;29: 3451–3460

work page 2021
[6]

Automated dysarthria severity classification using deep learning frame- works

Joshy AA, Rajan R. Automated dysarthria severity classification using deep learning frame- works. Proc EUSIPCO. 2022: 187–191

work page 2022
[7]

Automatic assessment of dysarthria severity level using audio descriptors

Bhat C, Vachhani B, Kopparapu SK. Automatic assessment of dysarthria severity level using audio descriptors. Proc ICASSP. 2020: 6504–6508

work page 2020
[8]

DSSCNet: A deep speech severity classifier for dysarthric speech

Wang Z, et al. DSSCNet: A deep speech severity classifier for dysarthric speech. Proc Interspeech. 2023: 4428–4432

work page 2023
[9]

Layer-wise feature probing of self-supervised speech models for dysarthria severity classification

Sapkota B, et al. Layer-wise feature probing of self-supervised speech models for dysarthria severity classification. Speech Commun. 2025;163: 103107

work page 2025
[10]

SpICE: Speech intelligibility classification for elderly and disordered speakers

Venugopalan S, Tobin J, Tomanek K, Green JR, Biadsy F. SpICE: Speech intelligibility classification for elderly and disordered speakers. Proc ICASSP. 2023: 1–5

work page 2023
[11]

An automatic measure for speech intelligibility in dysarthrias

Troger J, et al. An automatic measure for speech intelligibility in dysarthrias. Front Digit Health. 2024;6: 1385813

work page 2024
[12]

Clinical assessment and interpretation of dysarthria in ALS

Merler M, et al. Clinical assessment and interpretation of dysarthria in ALS. npj Digit Med. 2025;8: 45

work page 2025
[13]

Cross-lingual dysarthria severity classification for English, Korean, and Tamil

Yeo E, Chung M. Cross-lingual dysarthria severity classification for English, Korean, and Tamil. Proc Interspeech. 2022: 1613–1617. 30

work page 2022
[14]

Multilingual dysarthria classification with self-supervised representations

Stumpf A, et al. Multilingual dysarthria classification with self-supervised representations. Proc ICASSP. 2025

work page 2025
[15]

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Bae S, et al. Something from nothing: Data augmentation for robust severity level estima- tion. arXiv preprint arXiv:2603.15988. 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

Speech technology for automatic recognition and assessment of dysarthric speech: An overview

Bhat C, Strik H. Speech technology for automatic recognition and assessment of dysarthric speech: An overview. J Speech Lang Hear Res. 2025;68(1): 1–28

work page 2025
[17]

Self-supervised speech representations for dysarthric speech recognition

Hernandez A, et al. Self-supervised speech representations for dysarthric speech recognition. Proc Interspeech. 2022: 3483–3487

work page 2022
[18]

Evidence of vocal tract articulation in self-supervised learning of speech

Cho S, et al. Evidence of vocal tract articulation in self-supervised learning of speech. Proc ICASSP. 2023: 1–5

work page 2023
[19]

Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech

Sapir S, Ramig LO, Spielman JL, Fox C. Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech. J Speech Lang Hear Res. 2010;53(1): 114–125

work page 2010
[20]

Vowel articulation in Parkinson’s disease

Skodda S, Visser W, Schlegel U. Vowel articulation in Parkinson’s disease. J Voice. 2011;25(4): 467–472

work page 2011
[21]

J Speech Lang Hear Res

YunusovaY,WeismerG,WestburyJR,LindstromMJ.Articulatorymovementsduringvow- els in speakers with dysarthria and healthy controls. J Speech Lang Hear Res. 2008;51(3): 596–611

work page 2008
[22]

J Acoust Soc Am

LiuH,TsaoFM,KuhlPK.Theeffectofreducedvowelworkingspaceonspeechintelligibility in Mandarin-speaking young adults with cerebral palsy. J Acoust Soc Am. 2005;117(6): 3879–3889

work page 2005
[23]

Signal detection theory and psychophysics

Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley; 1966

work page 1966
[24]

Detection theory: A user’s guide

Macmillan NA, Creelman CD. Detection theory: A user’s guide. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 2005

work page 2005
[25]

Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life

Werker JF, Tees RC. Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life. Infant Behav Dev. 1984;7(1): 49–63

work page 1984
[26]

Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics

Bradlow AR, Torretta GM, Pisoni DB. Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics. Speech Commun. 1996;20(3-4): 255–272

work page 1996
[27]

Montreal Forced Aligner: Trainable text-speech alignment using Kaldi

McAuliffe M, Socolof M, Mihuc S, Wagner M, Sonderegger M. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. Proc Interspeech. 2017: 498–502

work page 2017
[28]

Qwen3-ASR: Multilingual automatic speech recognition model

Alibaba Cloud. Qwen3-ASR: Multilingual automatic speech recognition model. 2025. Avail- able from:https://huggingface.co/Qwen/Qwen3-ASR-1.7B

work page 2025
[29]

Clinician-rated intelligibility as a measure of dysarthric speech severity

Stipancic KL, Tjaden K, Wilding GE. Clinician-rated intelligibility as a measure of dysarthric speech severity. J Speech Lang Hear Res. 2022;65(12): 4519–4533

work page 2022
[30]

SAP: A large-scale dataset for speech accessibility

Millet J, et al. SAP: A large-scale dataset for speech accessibility. Proc Interspeech. 2024

work page 2024
[31]

The Interspeech 2025 Speech Accessibility Project Challenge

Zheng X, Phukon B, Na J, Cutrell E, Han K, Hasegawa-Johnson M, et al. The Interspeech 2025 Speech Accessibility Project Challenge. Proc Interspeech. 2025

work page 2025
[32]

Corpus of Pathological and Normal Speech (COPAS)

Martens JP, De Bodt MS, Van Nuffelen G, Middag C. Corpus of Pathological and Normal Speech (COPAS). IVDNT; 2011

work page 2011
[33]

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Rudzicz F, Namasivayam AK, Wolff T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval. 2012;46(4): 523–541. 31

work page 2012
[34]

Dysarthric speech database for universal access research

Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, et al. Dysarthric speech database for universal access research. Proc Interspeech. 2008: 1741– 1744

work page 2008
[35]

NeuroVoz: A Castilian Spanish corpus of parkinsonian speech

Moro-Velazquez L, et al. NeuroVoz: A Castilian Spanish corpus of parkinsonian speech. Sci Data. 2024;11: 595

work page 2024
[36]

MDSC: A Mandarin dysarthric speech corpus

Jin Z, et al. MDSC: A Mandarin dysarthric speech corpus. Proc Interspeech. 2024

work page 2024
[37]

New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease

Orozco-Arroyave JR, Arias-Londono JD, Vargas-Bonilla JF, Gonzalez-Rativa MC, Noth E. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. Proc LREC. 2014: 342–347

work page 2014
[38]

Librispeech: An ASR corpus based on public domain audio books

Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books. Proc ICASSP. 2015: 5206–5210

work page 2015
[39]

Voice analysis for ALS disease assessment

Mulfari D, et al. Voice analysis for ALS disease assessment. Sci Data. 2022

work page 2022
[40]

Meta-analysis in clinical trials

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3): 177–188

work page 1986
[41]

Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis

Kent RD, et al. Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis. J Speech Hear Res. 1992;35(4): 723–733

work page 1992
[42]

Controlling the false discovery rate: A practical and powerful approach to multiple testing

Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B. 1995;57(1): 289–300

work page 1995
[43]

Psychon Bull Rev

RouderJN,Lu J,SpeckmanP,SunD, JiangY.Ahierarchicalmodelfor estimatingresponse time distributions. Psychon Bull Rev. 2005;12(2): 195–223

work page 2005
[44]

A refined method for the meta-analysis of controlled clinical trials with binary outcome

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20(24): 3875–3889

work page 2001
[45]

The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method

IntHout J, Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1): 25

work page 2014
[46]

WavLM: Large-scale self-supervised pre-training for full stack speech processing

Chen S, Wang C, Chen Z, Wu Y, Liu S, Chen Z, et al. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE J Sel Top Signal Process. 2022;16(6): 1505–1518

work page 2022
[47]

wav2vec 2.0: A framework for self-supervised learning of speech representations

Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst. 2020;33: 12449–12460

work page 2020
[48]

Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation

Martin A, MacDonald RL, Jiang PP, Ladewig M, Cattiau J, Heywood R, et al. Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation. Front Lang Sci. 2025;4: 1569448

work page 2025
[49]

XLS-R: Self-supervised cross-lingual speech representation learning at scale

Babu A, Wang C, Tjandra A, Lakhotia K, Xu Q, Goyal N, et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. Proc Interspeech. 2022: 2278–2282

work page 2022
[50]

Pratap, A

Pratap V, Tjandra A, Shi B, Tomasello P, Babu A, Kunber S, et al. Scaling speech tech- nology to 1,000+ languages. arXiv preprint arXiv:2305.13516. 2023

work page arXiv 2023
[51]

Motor speech disorders: Substrates, differential diagnosis, and management

Duffy JR. Motor speech disorders: Substrates, differential diagnosis, and management. 4th ed. St. Louis: Elsevier; 2019

work page 2019
[52]

Layer-wise analysis of a self-supervised speech representation model

Pasad A, Chou JC, Livescu K. Layer-wise analysis of a self-supervised speech representation model. Proc IEEE ASRU. 2021: 914–921. 32

work page 2021

[1] [1]

Does speech and language therapy work? A review of the literature

Enderby P, Emerson J. Does speech and language therapy work? A review of the literature. London: Whurr Publishers; 1995

work page 1995

[2] [2]

Severity-aware learn- ing with triplet loss for dysarthric speech classification

Kadirvelu B, Ganapathy S, Sinha S, Ning L, Ding L, Joshi D, et al. Severity-aware learn- ing with triplet loss for dysarthric speech classification. PLOS Digit Health. 2025;4(11): e0001076

work page 2025

[3] [3]

Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling

Yeo E, Liss JM, Berisha V, Mortensen DR. Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling. arXiv preprint arXiv:2601.21205. 2026

work page arXiv 2026

[4] [4]

J., Mortensen, D

Choi Y, Lee S, Kim J. Self-supervised speech models encode phonetic context via position- dependent orthogonal subspaces. arXiv preprint arXiv:2603.12642. 2026

work page arXiv 2026

[5] [5]

HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process

Hsu WN, Bolte B, Tsai YHH, Lakhotia K, Salakhutdinov R, Mohamed A. HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process. 2021;29: 3451–3460

work page 2021

[6] [6]

Automated dysarthria severity classification using deep learning frame- works

Joshy AA, Rajan R. Automated dysarthria severity classification using deep learning frame- works. Proc EUSIPCO. 2022: 187–191

work page 2022

[7] [7]

Automatic assessment of dysarthria severity level using audio descriptors

Bhat C, Vachhani B, Kopparapu SK. Automatic assessment of dysarthria severity level using audio descriptors. Proc ICASSP. 2020: 6504–6508

work page 2020

[8] [8]

DSSCNet: A deep speech severity classifier for dysarthric speech

Wang Z, et al. DSSCNet: A deep speech severity classifier for dysarthric speech. Proc Interspeech. 2023: 4428–4432

work page 2023

[9] [9]

Layer-wise feature probing of self-supervised speech models for dysarthria severity classification

Sapkota B, et al. Layer-wise feature probing of self-supervised speech models for dysarthria severity classification. Speech Commun. 2025;163: 103107

work page 2025

[10] [10]

SpICE: Speech intelligibility classification for elderly and disordered speakers

Venugopalan S, Tobin J, Tomanek K, Green JR, Biadsy F. SpICE: Speech intelligibility classification for elderly and disordered speakers. Proc ICASSP. 2023: 1–5

work page 2023

[11] [11]

An automatic measure for speech intelligibility in dysarthrias

Troger J, et al. An automatic measure for speech intelligibility in dysarthrias. Front Digit Health. 2024;6: 1385813

work page 2024

[12] [12]

Clinical assessment and interpretation of dysarthria in ALS

Merler M, et al. Clinical assessment and interpretation of dysarthria in ALS. npj Digit Med. 2025;8: 45

work page 2025

[13] [13]

Cross-lingual dysarthria severity classification for English, Korean, and Tamil

Yeo E, Chung M. Cross-lingual dysarthria severity classification for English, Korean, and Tamil. Proc Interspeech. 2022: 1613–1617. 30

work page 2022

[14] [14]

Multilingual dysarthria classification with self-supervised representations

Stumpf A, et al. Multilingual dysarthria classification with self-supervised representations. Proc ICASSP. 2025

work page 2025

[15] [15]

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Bae S, et al. Something from nothing: Data augmentation for robust severity level estima- tion. arXiv preprint arXiv:2603.15988. 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

Speech technology for automatic recognition and assessment of dysarthric speech: An overview

Bhat C, Strik H. Speech technology for automatic recognition and assessment of dysarthric speech: An overview. J Speech Lang Hear Res. 2025;68(1): 1–28

work page 2025

[17] [17]

Self-supervised speech representations for dysarthric speech recognition

Hernandez A, et al. Self-supervised speech representations for dysarthric speech recognition. Proc Interspeech. 2022: 3483–3487

work page 2022

[18] [18]

Evidence of vocal tract articulation in self-supervised learning of speech

Cho S, et al. Evidence of vocal tract articulation in self-supervised learning of speech. Proc ICASSP. 2023: 1–5

work page 2023

[19] [19]

Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech

Sapir S, Ramig LO, Spielman JL, Fox C. Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech. J Speech Lang Hear Res. 2010;53(1): 114–125

work page 2010

[20] [20]

Vowel articulation in Parkinson’s disease

Skodda S, Visser W, Schlegel U. Vowel articulation in Parkinson’s disease. J Voice. 2011;25(4): 467–472

work page 2011

[21] [21]

J Speech Lang Hear Res

YunusovaY,WeismerG,WestburyJR,LindstromMJ.Articulatorymovementsduringvow- els in speakers with dysarthria and healthy controls. J Speech Lang Hear Res. 2008;51(3): 596–611

work page 2008

[22] [22]

J Acoust Soc Am

LiuH,TsaoFM,KuhlPK.Theeffectofreducedvowelworkingspaceonspeechintelligibility in Mandarin-speaking young adults with cerebral palsy. J Acoust Soc Am. 2005;117(6): 3879–3889

work page 2005

[23] [23]

Signal detection theory and psychophysics

Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley; 1966

work page 1966

[24] [24]

Detection theory: A user’s guide

Macmillan NA, Creelman CD. Detection theory: A user’s guide. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 2005

work page 2005

[25] [25]

Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life

Werker JF, Tees RC. Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life. Infant Behav Dev. 1984;7(1): 49–63

work page 1984

[26] [26]

Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics

Bradlow AR, Torretta GM, Pisoni DB. Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics. Speech Commun. 1996;20(3-4): 255–272

work page 1996

[27] [27]

Montreal Forced Aligner: Trainable text-speech alignment using Kaldi

McAuliffe M, Socolof M, Mihuc S, Wagner M, Sonderegger M. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. Proc Interspeech. 2017: 498–502

work page 2017

[28] [28]

Qwen3-ASR: Multilingual automatic speech recognition model

Alibaba Cloud. Qwen3-ASR: Multilingual automatic speech recognition model. 2025. Avail- able from:https://huggingface.co/Qwen/Qwen3-ASR-1.7B

work page 2025

[29] [29]

Clinician-rated intelligibility as a measure of dysarthric speech severity

Stipancic KL, Tjaden K, Wilding GE. Clinician-rated intelligibility as a measure of dysarthric speech severity. J Speech Lang Hear Res. 2022;65(12): 4519–4533

work page 2022

[30] [30]

SAP: A large-scale dataset for speech accessibility

Millet J, et al. SAP: A large-scale dataset for speech accessibility. Proc Interspeech. 2024

work page 2024

[31] [31]

The Interspeech 2025 Speech Accessibility Project Challenge

Zheng X, Phukon B, Na J, Cutrell E, Han K, Hasegawa-Johnson M, et al. The Interspeech 2025 Speech Accessibility Project Challenge. Proc Interspeech. 2025

work page 2025

[32] [32]

Corpus of Pathological and Normal Speech (COPAS)

Martens JP, De Bodt MS, Van Nuffelen G, Middag C. Corpus of Pathological and Normal Speech (COPAS). IVDNT; 2011

work page 2011

[33] [33]

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Rudzicz F, Namasivayam AK, Wolff T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval. 2012;46(4): 523–541. 31

work page 2012

[34] [34]

Dysarthric speech database for universal access research

Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, et al. Dysarthric speech database for universal access research. Proc Interspeech. 2008: 1741– 1744

work page 2008

[35] [35]

NeuroVoz: A Castilian Spanish corpus of parkinsonian speech

Moro-Velazquez L, et al. NeuroVoz: A Castilian Spanish corpus of parkinsonian speech. Sci Data. 2024;11: 595

work page 2024

[36] [36]

MDSC: A Mandarin dysarthric speech corpus

Jin Z, et al. MDSC: A Mandarin dysarthric speech corpus. Proc Interspeech. 2024

work page 2024

[37] [37]

New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease

Orozco-Arroyave JR, Arias-Londono JD, Vargas-Bonilla JF, Gonzalez-Rativa MC, Noth E. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. Proc LREC. 2014: 342–347

work page 2014

[38] [38]

Librispeech: An ASR corpus based on public domain audio books

Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books. Proc ICASSP. 2015: 5206–5210

work page 2015

[39] [39]

Voice analysis for ALS disease assessment

Mulfari D, et al. Voice analysis for ALS disease assessment. Sci Data. 2022

work page 2022

[40] [40]

Meta-analysis in clinical trials

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3): 177–188

work page 1986

[41] [41]

Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis

Kent RD, et al. Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis. J Speech Hear Res. 1992;35(4): 723–733

work page 1992

[42] [42]

Controlling the false discovery rate: A practical and powerful approach to multiple testing

Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B. 1995;57(1): 289–300

work page 1995

[43] [43]

Psychon Bull Rev

RouderJN,Lu J,SpeckmanP,SunD, JiangY.Ahierarchicalmodelfor estimatingresponse time distributions. Psychon Bull Rev. 2005;12(2): 195–223

work page 2005

[44] [44]

A refined method for the meta-analysis of controlled clinical trials with binary outcome

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20(24): 3875–3889

work page 2001

[45] [45]

The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method

IntHout J, Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1): 25

work page 2014

[46] [46]

WavLM: Large-scale self-supervised pre-training for full stack speech processing

Chen S, Wang C, Chen Z, Wu Y, Liu S, Chen Z, et al. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE J Sel Top Signal Process. 2022;16(6): 1505–1518

work page 2022

[47] [47]

wav2vec 2.0: A framework for self-supervised learning of speech representations

Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst. 2020;33: 12449–12460

work page 2020

[48] [48]

Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation

Martin A, MacDonald RL, Jiang PP, Ladewig M, Cattiau J, Heywood R, et al. Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation. Front Lang Sci. 2025;4: 1569448

work page 2025

[49] [49]

XLS-R: Self-supervised cross-lingual speech representation learning at scale

Babu A, Wang C, Tjandra A, Lakhotia K, Xu Q, Goyal N, et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. Proc Interspeech. 2022: 2278–2282

work page 2022

[50] [50]

Pratap, A

Pratap V, Tjandra A, Shi B, Tomasello P, Babu A, Kunber S, et al. Scaling speech tech- nology to 1,000+ languages. arXiv preprint arXiv:2305.13516. 2023

work page arXiv 2023

[51] [51]

Motor speech disorders: Substrates, differential diagnosis, and management

Duffy JR. Motor speech disorders: Substrates, differential diagnosis, and management. 4th ed. St. Louis: Elsevier; 2019

work page 2019

[52] [52]

Layer-wise analysis of a self-supervised speech representation model

Pasad A, Chou JC, Livescu K. Layer-wise analysis of a self-supervised speech representation model. Proc IEEE ASRU. 2021: 914–921. 32

work page 2021