pith. sign in

arxiv: 2604.10123 · v1 · submitted 2026-04-11 · 💻 cs.CL · cs.LG

Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

Pith reviewed 2026-05-10 16:11 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords dysarthria severityphonological featuresself-supervised speechtraining-free assessmentcross-linguald-prime scoresforced alignmentHuBERT embeddings
0
0 comments X p. Extension

The pith

Dysarthria severity can be measured from degradation along phonological contrast directions in frozen speech representations, with directions defined only from healthy control speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a training-free approach to quantify dysarthria severity by tracking how specific phonological features degrade in pretrained speech embeddings. Feature directions for contrasts such as nasality, voicing, and manner are estimated exclusively from healthy speakers using a forced aligner, then applied to patient embeddings to compute d-prime scores that form a 12-dimensional profile. These scores show consistent negative correlations with clinical severity ratings across 890 speakers, five languages, and three aetiologies, with effects that hold after multiple robustness checks. A sympathetic reader would care because the method needs no labelled dysarthric data and works for any language that already has an acoustic alignment model, removing a major barrier to scalable clinical assessment.

Core claim

By extracting phone-level embeddings from frozen HuBERT representations and computing d-prime scores along phonological feature directions estimated solely from healthy control speech, the resulting 12-dimensional profiles correlate significantly with clinical severity (random-effects meta-analysis rho from -0.50 to -0.56), with all five consonant features surviving multiple testing corrections and remaining stable under leave-one-corpus-out validation.

What carries the argument

D-prime scores computed along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) that are derived exclusively from healthy control speech within frozen HuBERT embeddings; these scores quantify per-speaker degradation relative to the healthy reference subspace.

If this is right

  • The same healthy-control directions produce reliable severity correlations in each of the five tested languages without any language-specific retraining.
  • Nasality d-prime decreases monotonically with increasing severity in six of the seven severity-graded corpora.
  • All twelve phonological features separate healthy controls from severely dysarthric speakers at p less than 0.001.
  • The pipeline can be deployed for any of the 29 languages that already have a Montreal Forced Aligner model.
  • No dysarthric speech data is required to build or adapt the severity estimator for a new clinical setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stability of the correlations under leave-one-corpus-out removal suggests the phonological degradation signal is not tied to any single recording condition or disease subtype.
  • Because the method separates controls from severe cases across aetiologies, it could serve as an initial screening layer before more detailed clinical evaluation.
  • Extending the set of phonological directions or testing other frozen self-supervised models might strengthen the observed correlations without introducing supervised training.
  • The requirement for only an existing aligner model implies the approach could transfer quickly to additional languages once their acoustic models become available.

Load-bearing premise

Phonological feature directions estimated only from healthy control speech using a pretrained forced aligner still capture the main degradation patterns that occur in dysarthric speech across languages and disease types.

What would settle it

Observing no significant correlation between the d-prime phonological scores and independent clinical severity ratings in a new corpus from an additional language or aetiology, after applying the same healthy-control direction estimation, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.10123 by Antonio Armando Ortiz Barra\~n\'on, Bernard Muller, LaVonne Roberts.

Figure 1
Figure 1. Figure 1: Phonological subspace collapse in dysarthric speech. UMAP projection of phone-level Hu￾BERT embeddings for one healthy control speaker (MC01, left) and one speaker with severe dysarthria (M01, right) from the TORGO corpus. Points represent individual phone tokens coloured by class: nasal consonants (m, n, ng) vs. oral stops (p, b, t, d, k, g). In the healthy speaker, the two classes form tight, well-separa… view at source ↗
Figure 2
Figure 2. Figure 2: Clinical validation: d ′ correlates with clinician-rated intelligibility across languages. Left: stridency d ′ vs intelligibility (Kendall’s τ = 0.407, p = 2.9 × 10−6 , n = 64). Right: mean consonant d ′ vs intelligibility (τ = 0.323, p = 3.9 × 10−5 , n = 86). Data from UA-Speech (English, cerebral palsy) and COPAS (Dutch, mixed aetiologies). Trendlines show positive linear association. We test this hypoth… view at source ↗
Figure 3
Figure 3. Figure 3: Left panel: group means for all reported features across control/mild/moderate/severe stages ( [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Consonant d ′ degradation by severity across corpora. Each panel displays one of the five consonant d ′ features (nasality, sonorant, voicing, strident, manner) with speaker-level means plotted across severity groups (control, mild, moderate, severe) for each corpus independently. Lines connect severity group means within each corpus. All five features show a consistent downward trend from control to sever… view at source ↗
Figure 5
Figure 5. Figure 5: Severity correlation forest plot for all 12 phonological features. Spearman ρ between each feature and ordinal severity (control = 0, mild = 1, moderate = 2, severe = 3), pooled across all corpora. Horizontal bars show bootstrap 95% confidence intervals (1,000 iterations). Features are sorted by effect size. Dark navy: consonant d ′ features; light blue: vowel d ′ features; orange: structural metrics. All … view at source ↗
Figure 6
Figure 6. Figure 6: Consonant d ′ profiles across 867 speakers sorted by severity. Each column is one speaker; each row is one consonant d ′ feature. Speakers are sorted left to right by severity (control, mild, moderate, severe), then by mean d ′ within each group. Top strips show severity (green to red) and aetiology (colour-coded). The green-to-red gradient across all five features demonstrates that phonological subspace d… view at source ↗
Figure 7
Figure 7. Figure 7: Phonological fingerprints by aetiology. Radar plots showing the mean consonant d ′ profile for cerebral palsy (CP, n = 86), Parkinson’s disease (PD, n = 211), and amyotrophic lateral sclerosis (ALS, n = 50) speakers. Each axis represents one consonant d ′ feature; the green shaded area shows the healthy control reference profile. CP shows the most uniform degradation across all features, while PD and ALS e… view at source ↗
Figure 8
Figure 8. Figure 8: visualises this relationship. This occurs because d ′ estimation is biased upward with more observations – the sample means more closely approximate the true population means when n is large, reducing the pooled standard deviation relative to the mean difference. The practical consequence is that absolute d ′ values are not comparable across corpora with different amounts of speech per speaker. For example… view at source ↗
Figure 9
Figure 9. Figure 9: ROC curves for binary severity detection. Left: severe vs. rest (stridency d ′ AUC = 0.890). Right: moderate-or-worse vs. mild/control. All five consonant d ′ features and their mean are shown. Optimal thresholds determined by Youden’s J statistic. 5.9 Robustness and Sensitivity Analyses Seven additional analyses address the main potential confounds: token count, statistical multi￾plicity, corpus sensitivi… view at source ↗
Figure 10
Figure 10. Figure 10: Random-effects meta-analysis forest plot, k = 8 corpora. I 2 = 87–92% indicates high between-corpus heterogeneity in magnitude but consistent direction. All five consonant d ′ features are significant under both DerSimonian-Laird and the more conservative Hartung-Knapp-Sidik-Jonkman estimator (all HKSJ p < 0.013). All five consonant features are significant under both DL and the more conservative HKSJ inf… view at source ↗
read the original abstract

Dysarthric speech severity assessment typically requires trained clinicians or supervised models built from labelled pathological speech, limiting scalability across languages and clinical settings. We present a training-free method that quantifies dysarthria severity by measuring degradation in phonological feature subspaces within frozen HuBERT representations. No supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d-prime scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy controls, and construct a 12-dimensional phonological profile.Evaluating 890 speakers across 10 corpora, 5 languages (English, Spanish, Dutch, Mandarin, French), and 3 primary aetiologies (Parkinson's disease, cerebral palsy, ALS), we find that all five consonant d-prime features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2e-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero). The effect replicates within individual corpora, survives FDR correction, and remains robust to leave-one-corpus-out removal and alignment quality controls. Nasality d-prime decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages). We release the full pipeline and phone feature configurations for six languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a training-free method for cross-lingual dysarthria severity assessment that quantifies degradation in phonological feature subspaces within frozen HuBERT representations. Phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) are estimated exclusively from healthy control speech via the Montreal Forced Aligner; d-prime scores along these fixed directions are then computed per speaker to yield a 12-dimensional phonological profile. On 890 speakers across 10 corpora, 5 languages, and 3 aetiologies, all five consonant d-prime features show significant negative correlations with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56; pooled Spearman rho = -0.47 to -0.55), with replication within corpora, survival of FDR correction, and robustness to leave-one-corpus-out and alignment-quality controls. The pipeline requires no dysarthric training data and is released for six languages.

Significance. If the geometric assumption holds, the approach offers a scalable, language-agnostic alternative to supervised models or clinician ratings by eliminating the need for labeled pathological speech. The explicit release of code, phone-feature configurations, and the training-free design are concrete strengths that enhance reproducibility and potential clinical adoption across the 29 languages supported by MFA.

major comments (2)
  1. [Methods (phonological direction estimation and d-prime computation)] The load-bearing assumption that directions derived solely from healthy-control embeddings coincide with the primary axes of phonological degradation in dysarthric speech is not directly tested. No comparison of control-derived directions versus directions estimated from patient embeddings, nor any analysis of embedding-geometry shifts (e.g., subspace overlap or principal-component divergence between control and patient sets), is reported. Existing checks (FDR, leave-one-corpus-out, alignment quality) do not address this concern, leaving the reported rho values vulnerable to the possibility that dysarthria collapses distinctions along orthogonal axes.
  2. [Feature extraction pipeline and robustness checks] Details on how the pretrained MFA aligner performs on dysarthric speech and any quantitative mitigation of alignment errors are insufficient. Dysarthric speech commonly produces higher alignment error rates; without reported alignment accuracy metrics per severity level or sensitivity analyses showing that d-prime scores remain stable under realistic misalignment, the feature-extraction pipeline risks systematic confounds that could inflate or deflate the observed correlations.
minor comments (2)
  1. [Abstract] The abstract states a 12-dimensional profile but enumerates only nine directions (five consonant + four vowel); clarify whether additional features, combinations, or vowel-specific contrasts are included and list them explicitly.
  2. [Results] Provide a supplementary table summarizing per-corpus speaker counts, severity distributions, and exact clinical rating scales to allow readers to assess heterogeneity in the meta-analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major concern below, providing clarifications and indicating revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods (phonological direction estimation and d-prime computation)] The load-bearing assumption that directions derived solely from healthy-control embeddings coincide with the primary axes of phonological degradation in dysarthric speech is not directly tested. No comparison of control-derived directions versus directions estimated from patient embeddings, nor any analysis of embedding-geometry shifts (e.g., subspace overlap or principal-component divergence between control and patient sets), is reported. Existing checks (FDR, leave-one-corpus-out, alignment quality) do not address this concern, leaving the reported rho values vulnerable to the possibility that dysarthria collapses distinctions along orthogonal axes.

    Authors: We agree this is a substantive point and that a direct test would further validate the geometric assumption. The original manuscript relied on the observed correlations with clinical ratings (which replicate across languages, aetiologies, and corpora) as indirect evidence that the control-derived directions capture relevant degradation. To address the concern directly, the revised manuscript now includes a comparison of control-derived directions against directions estimated from dysarthric embeddings, along with subspace overlap and principal-component divergence metrics between control and patient sets. These new analyses are reported in an expanded Methods section and confirm that degradation occurs primarily along the control-derived axes rather than orthogonal ones. revision: yes

  2. Referee: [Feature extraction pipeline and robustness checks] Details on how the pretrained MFA aligner performs on dysarthric speech and any quantitative mitigation of alignment errors are insufficient. Dysarthric speech commonly produces higher alignment error rates; without reported alignment accuracy metrics per severity level or sensitivity analyses showing that d-prime scores remain stable under realistic misalignment, the feature-extraction pipeline risks systematic confounds that could inflate or deflate the observed correlations.

    Authors: We acknowledge that the original description of alignment robustness was brief. The manuscript already referenced alignment quality controls, but we have expanded this in the revision by adding quantitative alignment accuracy metrics (boundary error and phone-level accuracy) for dysarthric speech, now stratified by severity level using available corpus annotations. We have also included sensitivity analyses that simulate realistic misalignment rates and demonstrate stability of the d-prime scores and their severity correlations. These additions are incorporated into the Methods and Results sections to mitigate concerns about systematic confounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation is self-contained

full rationale

The paper computes phonological feature directions exclusively from healthy-control embeddings via a pretrained MFA aligner, then calculates d-prime scores as discriminability along those fixed directions on patient speech. These scores are correlated post-hoc with clinical severity labels; no parameters are fitted to severity data, and the directions are independent of patient labels by construction. The central result is an empirical statistical association (meta-analysis rho values), not a prediction forced by redefinition or self-citation. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. The approach is explicitly training-free and applies control-derived subspaces without circular reduction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The approach relies on standard pretrained models and phonological knowledge rather than introducing new entities or fitting many parameters. The d-prime calculation uses standard statistical methods.

axioms (3)
  • domain assumption Pretrained HuBERT model encodes phonological information in its representations
    The method uses subspaces in these representations to measure degradation.
  • domain assumption Montreal Forced Aligner can accurately segment dysarthric speech into phones
    Used to extract phone-level embeddings from patients.
  • domain assumption The selected phonological contrasts (nasality, voicing, etc.) are relevant to dysarthria severity
    Based on literature but assumed to capture the degradation.

pith-pipeline@v0.9.0 · 5642 in / 1463 out tokens · 73500 ms · 2026-05-10T16:11:55.795433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

    cs.CL 2026-04 unverdicted novelty 4.0

    Phonological subspace collapse in SSL speech representations produces aetiology-specific degradation profiles that remain stable in shape across languages and model architectures.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Does speech and language therapy work? A review of the literature

    Enderby P, Emerson J. Does speech and language therapy work? A review of the literature. London: Whurr Publishers; 1995

  2. [2]

    Severity-aware learn- ing with triplet loss for dysarthric speech classification

    Kadirvelu B, Ganapathy S, Sinha S, Ning L, Ding L, Joshi D, et al. Severity-aware learn- ing with triplet loss for dysarthric speech classification. PLOS Digit Health. 2025;4(11): e0001076

  3. [3]

    Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling

    Yeo E, Liss JM, Berisha V, Mortensen DR. Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling. arXiv preprint arXiv:2601.21205. 2026

  4. [4]

    J., Mortensen, D

    Choi Y, Lee S, Kim J. Self-supervised speech models encode phonetic context via position- dependent orthogonal subspaces. arXiv preprint arXiv:2603.12642. 2026

  5. [5]

    HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process

    Hsu WN, Bolte B, Tsai YHH, Lakhotia K, Salakhutdinov R, Mohamed A. HuBERT: Self- supervisedspeechrepresentationlearningbymaskedpredictionofhiddenunits.IEEE/ACM Trans Audio Speech Lang Process. 2021;29: 3451–3460

  6. [6]

    Automated dysarthria severity classification using deep learning frame- works

    Joshy AA, Rajan R. Automated dysarthria severity classification using deep learning frame- works. Proc EUSIPCO. 2022: 187–191

  7. [7]

    Automatic assessment of dysarthria severity level using audio descriptors

    Bhat C, Vachhani B, Kopparapu SK. Automatic assessment of dysarthria severity level using audio descriptors. Proc ICASSP. 2020: 6504–6508

  8. [8]

    DSSCNet: A deep speech severity classifier for dysarthric speech

    Wang Z, et al. DSSCNet: A deep speech severity classifier for dysarthric speech. Proc Interspeech. 2023: 4428–4432

  9. [9]

    Layer-wise feature probing of self-supervised speech models for dysarthria severity classification

    Sapkota B, et al. Layer-wise feature probing of self-supervised speech models for dysarthria severity classification. Speech Commun. 2025;163: 103107

  10. [10]

    SpICE: Speech intelligibility classification for elderly and disordered speakers

    Venugopalan S, Tobin J, Tomanek K, Green JR, Biadsy F. SpICE: Speech intelligibility classification for elderly and disordered speakers. Proc ICASSP. 2023: 1–5

  11. [11]

    An automatic measure for speech intelligibility in dysarthrias

    Troger J, et al. An automatic measure for speech intelligibility in dysarthrias. Front Digit Health. 2024;6: 1385813

  12. [12]

    Clinical assessment and interpretation of dysarthria in ALS

    Merler M, et al. Clinical assessment and interpretation of dysarthria in ALS. npj Digit Med. 2025;8: 45

  13. [13]

    Cross-lingual dysarthria severity classification for English, Korean, and Tamil

    Yeo E, Chung M. Cross-lingual dysarthria severity classification for English, Korean, and Tamil. Proc Interspeech. 2022: 1613–1617. 30

  14. [14]

    Multilingual dysarthria classification with self-supervised representations

    Stumpf A, et al. Multilingual dysarthria classification with self-supervised representations. Proc ICASSP. 2025

  15. [15]

    Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

    Bae S, et al. Something from nothing: Data augmentation for robust severity level estima- tion. arXiv preprint arXiv:2603.15988. 2026

  16. [16]

    Speech technology for automatic recognition and assessment of dysarthric speech: An overview

    Bhat C, Strik H. Speech technology for automatic recognition and assessment of dysarthric speech: An overview. J Speech Lang Hear Res. 2025;68(1): 1–28

  17. [17]

    Self-supervised speech representations for dysarthric speech recognition

    Hernandez A, et al. Self-supervised speech representations for dysarthric speech recognition. Proc Interspeech. 2022: 3483–3487

  18. [18]

    Evidence of vocal tract articulation in self-supervised learning of speech

    Cho S, et al. Evidence of vocal tract articulation in self-supervised learning of speech. Proc ICASSP. 2023: 1–5

  19. [19]

    Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech

    Sapir S, Ramig LO, Spielman JL, Fox C. Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech. J Speech Lang Hear Res. 2010;53(1): 114–125

  20. [20]

    Vowel articulation in Parkinson’s disease

    Skodda S, Visser W, Schlegel U. Vowel articulation in Parkinson’s disease. J Voice. 2011;25(4): 467–472

  21. [21]

    J Speech Lang Hear Res

    YunusovaY,WeismerG,WestburyJR,LindstromMJ.Articulatorymovementsduringvow- els in speakers with dysarthria and healthy controls. J Speech Lang Hear Res. 2008;51(3): 596–611

  22. [22]

    J Acoust Soc Am

    LiuH,TsaoFM,KuhlPK.Theeffectofreducedvowelworkingspaceonspeechintelligibility in Mandarin-speaking young adults with cerebral palsy. J Acoust Soc Am. 2005;117(6): 3879–3889

  23. [23]

    Signal detection theory and psychophysics

    Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley; 1966

  24. [24]

    Detection theory: A user’s guide

    Macmillan NA, Creelman CD. Detection theory: A user’s guide. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 2005

  25. [25]

    Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life

    Werker JF, Tees RC. Cross-language speech perception: Evidence for perceptual reorgani- zation during the first year of life. Infant Behav Dev. 1984;7(1): 49–63

  26. [26]

    Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics

    Bradlow AR, Torretta GM, Pisoni DB. Intelligibility of normal speech I: Global and fine- grained acoustic-phonetic talker characteristics. Speech Commun. 1996;20(3-4): 255–272

  27. [27]

    Montreal Forced Aligner: Trainable text-speech alignment using Kaldi

    McAuliffe M, Socolof M, Mihuc S, Wagner M, Sonderegger M. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. Proc Interspeech. 2017: 498–502

  28. [28]

    Qwen3-ASR: Multilingual automatic speech recognition model

    Alibaba Cloud. Qwen3-ASR: Multilingual automatic speech recognition model. 2025. Avail- able from:https://huggingface.co/Qwen/Qwen3-ASR-1.7B

  29. [29]

    Clinician-rated intelligibility as a measure of dysarthric speech severity

    Stipancic KL, Tjaden K, Wilding GE. Clinician-rated intelligibility as a measure of dysarthric speech severity. J Speech Lang Hear Res. 2022;65(12): 4519–4533

  30. [30]

    SAP: A large-scale dataset for speech accessibility

    Millet J, et al. SAP: A large-scale dataset for speech accessibility. Proc Interspeech. 2024

  31. [31]

    The Interspeech 2025 Speech Accessibility Project Challenge

    Zheng X, Phukon B, Na J, Cutrell E, Han K, Hasegawa-Johnson M, et al. The Interspeech 2025 Speech Accessibility Project Challenge. Proc Interspeech. 2025

  32. [32]

    Corpus of Pathological and Normal Speech (COPAS)

    Martens JP, De Bodt MS, Van Nuffelen G, Middag C. Corpus of Pathological and Normal Speech (COPAS). IVDNT; 2011

  33. [33]

    The TORGO database of acoustic and articulatory speech from speakers with dysarthria

    Rudzicz F, Namasivayam AK, Wolff T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval. 2012;46(4): 523–541. 31

  34. [34]

    Dysarthric speech database for universal access research

    Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, et al. Dysarthric speech database for universal access research. Proc Interspeech. 2008: 1741– 1744

  35. [35]

    NeuroVoz: A Castilian Spanish corpus of parkinsonian speech

    Moro-Velazquez L, et al. NeuroVoz: A Castilian Spanish corpus of parkinsonian speech. Sci Data. 2024;11: 595

  36. [36]

    MDSC: A Mandarin dysarthric speech corpus

    Jin Z, et al. MDSC: A Mandarin dysarthric speech corpus. Proc Interspeech. 2024

  37. [37]

    New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease

    Orozco-Arroyave JR, Arias-Londono JD, Vargas-Bonilla JF, Gonzalez-Rativa MC, Noth E. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. Proc LREC. 2014: 342–347

  38. [38]

    Librispeech: An ASR corpus based on public domain audio books

    Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books. Proc ICASSP. 2015: 5206–5210

  39. [39]

    Voice analysis for ALS disease assessment

    Mulfari D, et al. Voice analysis for ALS disease assessment. Sci Data. 2022

  40. [40]

    Meta-analysis in clinical trials

    DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3): 177–188

  41. [41]

    Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis

    Kent RD, et al. Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis. J Speech Hear Res. 1992;35(4): 723–733

  42. [42]

    Controlling the false discovery rate: A practical and powerful approach to multiple testing

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B. 1995;57(1): 289–300

  43. [43]

    Psychon Bull Rev

    RouderJN,Lu J,SpeckmanP,SunD, JiangY.Ahierarchicalmodelfor estimatingresponse time distributions. Psychon Bull Rev. 2005;12(2): 195–223

  44. [44]

    A refined method for the meta-analysis of controlled clinical trials with binary outcome

    Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20(24): 3875–3889

  45. [45]

    The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method

    IntHout J, Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for ran- dom effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1): 25

  46. [46]

    WavLM: Large-scale self-supervised pre-training for full stack speech processing

    Chen S, Wang C, Chen Z, Wu Y, Liu S, Chen Z, et al. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE J Sel Top Signal Process. 2022;16(6): 1505–1518

  47. [47]

    wav2vec 2.0: A framework for self-supervised learning of speech representations

    Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst. 2020;33: 12449–12460

  48. [48]

    Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation

    Martin A, MacDonald RL, Jiang PP, Ladewig M, Cattiau J, Heywood R, et al. Project Euphonia: advancing inclusive speech recognition through expanded data collection and evaluation. Front Lang Sci. 2025;4: 1569448

  49. [49]

    XLS-R: Self-supervised cross-lingual speech representation learning at scale

    Babu A, Wang C, Tjandra A, Lakhotia K, Xu Q, Goyal N, et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. Proc Interspeech. 2022: 2278–2282

  50. [50]

    Pratap, A

    Pratap V, Tjandra A, Shi B, Tomasello P, Babu A, Kunber S, et al. Scaling speech tech- nology to 1,000+ languages. arXiv preprint arXiv:2305.13516. 2023

  51. [51]

    Motor speech disorders: Substrates, differential diagnosis, and management

    Duffy JR. Motor speech disorders: Substrates, differential diagnosis, and management. 4th ed. St. Louis: Elsevier; 2019

  52. [52]

    Layer-wise analysis of a self-supervised speech representation model

    Pasad A, Chou JC, Livescu K. Layer-wise analysis of a self-supervised speech representation model. Proc IEEE ASRU. 2021: 914–921. 32