pith. sign in

arxiv: 2604.21706 · v1 · submitted 2026-04-23 · 💻 cs.CL

Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

Pith reviewed 2026-05-09 21:17 UTC · model grok-4.3

classification 💻 cs.CL
keywords dysarthriaphonological subspacesself-supervised speech representationsaetiology-specific profilescross-lingual stabilityd-prime separabilityspeech motor disordersParkinson's disease
0
0 comments X p. Extension

The pith

Dysarthria from different causes produces distinct phonological subspace collapse patterns that keep the same shape across languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper scales a training-free method for measuring how dysarthria collapses phonological feature subspaces in frozen self-supervised speech models from hundreds to 3,374 speakers across 12 languages and five causes. It finds that the resulting degradation profiles separate the causes at the group level for most features and that the relative shape of each profile stays highly consistent no matter which language the speakers use. The same patterns emerge from six different model backbones and survive when token counts are fixed, showing the signal is not an artifact of how much data is used. If these results hold, speech samples alone could reveal the underlying cause of impaired speech production in a language-independent way, though absolute severity scores would still need local calibration within each dataset.

Core claim

Aetiology-specific degradation profiles are distinguishable at the group level with 10 of 13 features yielding large effect sizes (epsilon-squared > 0.14) and Parkinson's disease separable from the articulatory execution group at Cohen's d = 0.83; cosine similarity of 5-dimensional consonant d-prime profiles exceeds 0.95 across languages for each aetiology, while all six SSL backbones produce monotonic severity gradients with inter-model agreement above rho = 0.77 and fixed-token estimation preserves the severity correlation.

What carries the argument

d-prime separability of phonological feature subspaces in frozen self-supervised speech representations, which measures how much each aetiology reduces the model's ability to distinguish phonological classes such as consonants and vowels.

If this is right

  • Supports a training-free, architecture-independent framework for aetiology-aware dysarthria characterisation.
  • Enables language-independent phenotyping of degradation patterns with within-corpus calibration needed for absolute severity.
  • Group-level distinction works for most phonological features while individual-level classification stays limited at 22.6 percent macro F1.
  • Cross-backbone agreement above rho = 0.77 and preserved correlations at fixed token counts confirm the signal is robust and not a token-count artefact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Automated tools could screen for likely aetiology from speech samples alone in multilingual clinical settings.
  • The high cross-lingual shape stability suggests phonological subspaces break down in ways that are fundamental to speech motor control rather than language-specific.
  • Longitudinal tracking of the same speakers could reveal how subspace collapse progresses with disease stage.
  • Calibration methods that align absolute d-prime values across datasets would allow severity comparisons between studies and languages.

Load-bearing premise

The measured d-prime differences truly reflect aetiology-driven phonological subspace collapse rather than recording conditions, speaker demographics or dataset-specific artifacts.

What would settle it

Re-analysis of d-prime profiles after matching speakers across aetiologies for age, sex, recording quality and language; if the group differences disappear, the claim that profiles are aetiology-specific would be falsified.

Figures

Figures reproduced from arXiv: 2604.21706 by Antonio Armando Ortiz Barra\~n\'on, Bernard Muller, LaVonne Roberts.

Figure 1
Figure 1. Figure 1: Deviation from healthy controls by aetiology (Cohen’s d). Grey cells indicate missing data (insufficient speakers for effect size computation; Stroke × vowel triangle area has fewer than 5 speakers with valid estimates). Rows show 13 phonological and prosodic features; columns show 5 dysarthric aetiologies. Darker red indicates greater degradation from HC baseline. To rule out the possibility that aetiolog… view at source ↗
Figure 2
Figure 2. Figure 2: Pairwise aetiology comparison (HC-normalised to 1.0). Each panel shows two aetiologies against the HC reference (light blue). Distinct shapes reflect aetiology-specific degradation patterns. 4.2 4.2 Cross-lingual profile-shape stability A central question for clinical deployment is whether phonological degradation profiles are language￾specific or consistent across languages. If a PD patient in Slovakia sh… view at source ↗
Figure 3
Figure 3. Figure 3: HC-normalised Parkinson’s disease profiles across 6 languages (those with n>=3 PD speakers). Each bar shows the ratio of PD mean to language-specific HC mean for 9 d-prime features (1.0 = healthy). The parallel pattern across languages demonstrates cross-lingual consistency of the PD phonological profile. As an anecdotal observation, the single Swahili PD speaker (n = 1) has a consonant d-prime profile wit… view at source ↗
Figure 4
Figure 4. Figure 4: Severity gradient across 6 SSL backbones. Error bars show 95% bootstrap confidence intervals (1,000 resamples over speakers). The smallest mild–moderate margin (XLS-R, 0.004) is not significant; all other adjacent-severity differences exceed the bootstrap CI width. All models show monotonic decrease from control to severe. Absolute d-prime magnitudes differ by model architecture, but the gradient direction… view at source ↗
Figure 5
Figure 5. Figure 5: Inter-model agreement on per-speaker composite consonant d-prime (Spearman rho). 13 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

We previously introduced a training-free method for dysarthria severity assessment based on d-prime separability of phonological feature subspaces in frozen self-supervised speech representations, validated on 890 speakers across 5 languages with HuBERT-base. Here, we scale the analysis to 3,374 speakers from 25 datasets spanning 12 languages and 5 aetiologies (Parkinson's disease, cerebral palsy, ALS, Down syndrome, and stroke), plus healthy controls, using 6 SSL backbones. We report three findings. First, aetiology-specific degradation profiles are distinguishable at the group level: 10 of 13 features yield large effect sizes (epsilon-squared > 0.14, Holm-corrected p < 0.001), with Parkinson's disease separable from the articulatory execution group at Cohen's d = 0.83; individual-level classification remains limited (22.6% macro F1). Second, profiles show cross-lingual profile-shape stability: cosine similarity of 5-dimensional consonant d-prime profiles exceeds 0.95 across the languages available for each aetiology. Absolute d-prime magnitudes are not cross-lingually calibrated, so the method supports language-independent phenotyping of degradation patterns but requires within-corpus calibration for absolute severity interpretation. Third, the method is architecture-independent: all 6 backbones produce monotonic severity gradients with inter-model agreement exceeding rho = 0.77. Fixed-token d-prime estimation preserves the severity correlation (rho = -0.733 at 200 tokens per class), confirming that the signal is not a token-count artefact. These results support phonological subspace analysis as a robust, training-free framework for aetiology-aware dysarthria characterisation, with evidence of cross-lingual profile-shape stability and cross-backbone robustness in the represented sample.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper scales a training-free d-prime separability analysis of phonological feature subspaces in frozen SSL speech representations to 3,374 speakers across 25 datasets, 12 languages, and 5 dysarthria aetiologies (PD, CP, ALS, DS, stroke) plus controls. It reports three main results: (1) aetiology-specific degradation profiles are group-distinguishable (10/13 features with ε² > 0.14, Holm p < 0.001; PD vs. articulatory-execution group d = 0.83), though individual classification is limited (22.6% macro F1); (2) consonant d-prime profile shapes are cross-lingually stable (cosine > 0.95) while absolute magnitudes are not; (3) the pattern is robust across 6 SSL backbones (inter-model ρ > 0.77) and preserved under fixed-token estimation.

Significance. If the attribution to aetiology holds after confound controls, the work supplies a scalable, training-free, architecture-independent phenotyping tool for dysarthria that distinguishes degradation patterns by cause and language, with direct clinical relevance for severity assessment and subgrouping. The large multi-dataset, multi-language sample and explicit reporting of effect sizes, p-values, and inter-model agreement are strengths.

major comments (2)
  1. [Methods/Results (pooled analysis)] Methods and Results sections: the pooled analysis across 25 heterogeneous datasets attributes between-aetiology d-prime differences directly to disease without reported dataset-level matching, regression on recording-quality covariates (microphone, sampling rate, noise), or within-dataset replication of the aetiology contrast. Because SSL embeddings are sensitive to corpus-level statistics, the reported ε² > 0.14 and d = 0.83 could partly reflect stable dataset artifacts rather than phonological subspace collapse; the cross-lingual cosine > 0.95 is equally consistent with a stable confound pattern.
  2. [Results (classification performance)] Results, first finding: individual-level classification performance is reported as only 22.6% macro F1 despite large group-level effect sizes; this gap is acknowledged but not quantified with respect to how much of the group separability survives after speaker-level demographic or recording controls, weakening the claim that the profiles are aetiology-specific at a clinically usable level.
minor comments (2)
  1. [Abstract] Abstract: the claim of 'architecture-independent' results should be qualified by the specific 6 backbones tested rather than left unqualified.
  2. [Discussion] The paper notes that absolute d-prime magnitudes require within-corpus calibration; this limitation should be stated more prominently when discussing clinical translation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting important methodological concerns regarding potential confounds in our pooled analysis and the interpretation of our classification results. We provide point-by-point responses below and indicate where revisions will be made to address these issues.

read point-by-point responses
  1. Referee: Methods and Results sections: the pooled analysis across 25 heterogeneous datasets attributes between-aetiology d-prime differences directly to disease without reported dataset-level matching, regression on recording-quality covariates (microphone, sampling rate, noise), or within-dataset replication of the aetiology contrast. Because SSL embeddings are sensitive to corpus-level statistics, the reported ε² > 0.14 and d = 0.83 could partly reflect stable dataset artifacts rather than phonological subspace collapse; the cross-lingual cosine > 0.95 is equally consistent with a stable confound pattern.

    Authors: We agree that the absence of explicit controls for dataset-level factors such as recording quality is a limitation of the current pooled analysis. Although we have replication across multiple datasets for several aetiologies and observe high cross-lingual profile stability, this does not fully rule out stable confounds. In the revised manuscript, we will include additional analyses: (1) regression of d-prime values on available recording metadata where reported across datasets, and (2) within-dataset effect size calculations for aetiologies represented in multiple corpora. These will be reported in a new supplementary section to better isolate aetiology-specific effects. revision: partial

  2. Referee: Results, first finding: individual-level classification performance is reported as only 22.6% macro F1 despite large group-level effect sizes; this gap is acknowledged but not quantified with respect to how much of the group separability survives after speaker-level demographic or recording controls, weakening the claim that the profiles are aetiology-specific at a clinically usable level.

    Authors: We acknowledge that the modest individual classification performance (22.6% macro F1) limits clinical applicability at the single-speaker level, and we have not yet quantified the robustness of group separability after speaker-level controls. In revision, we will add speaker-level analyses using mixed-effects models to evaluate the unique variance explained by aetiology after accounting for demographics (age, sex) and dataset indicators. This will provide a clearer assessment of the aetiology-specific signal at both group and individual levels. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on new data

full rationale

The paper applies a previously introduced d-prime separability method to a new collection of 3,374 speakers across 25 datasets. All reported quantities (epsilon-squared effect sizes, Cohen's d, cosine similarities of profile shapes, Spearman correlations) are computed directly from the frozen SSL embeddings and group labels in the current data. No equations redefine the target profiles in terms of themselves, no fitted parameters are relabeled as predictions, and the self-citation to the prior method paper is not load-bearing for the distinguishability or stability claims. The derivation chain consists of standard statistical comparisons on independent samples and therefore contains no reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical scaling study with no mathematical derivations; relies on standard statistical tests and pre-trained SSL models whose internal representations are treated as given.

pith-pipeline@v0.9.0 · 5644 in / 1190 out tokens · 42311 ms · 2026-05-09T21:17:35.117786+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 34 canonical work pages · 3 internal anchors

  1. [1]

    Duffy, J. R. (2019). Motor Speech Disorders: Substrates, Differential Diagnosis, and Management (4th ed.). Elsevier

  2. [2]

    Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

    Muller, B., Ortiz Barranon, A. A., and Roberts, L. (2026). Training-free cross-lingual dysarthria severity assessment via phonological subspace analysis in self-supervised speech representations. arXiv preprint arXiv:2604.10123.doi:10.48550/arXiv.2604.10123

  3. [3]

    J., Mortensen, D

    Choi, K., Yeo, E., Cho, C. J., Mortensen, D. R., and Harwath, D. (2026). Self-supervised speech models encode phonetic context via position-dependent orthogonal subspaces. arXiv preprint arXiv:2603.12642

  4. [4]

    J., Wu, P., Mohamed, A., and Anumanchipalli, G

    Cho, C. J., Wu, P., Mohamed, A., and Anumanchipalli, G. K. (2023). Evidence of vocal tract articulation in self-supervised learning of speech. In Proceedings of ICASSP 2023, 1–5.doi:10.1109/ICASSP4935 7.2023.10094711

  5. [5]

    Towards scientificintelligence:Asurveyofllm-basedscientificagents

    Halpern, B. M., Tienkamp, T., Abur, D., and Toda, T. (2026). PathBench: Speech intelligibility benchmark for automatic pathological speech assessment. arXiv preprint arXiv:2603.08097.doi:10.48550/arXiv .2603.08097

  6. [6]

    K., Rusz, J., Magimai Doss, M., Orozco- Arroyave, J

    Hernandez, A., Yeo, E., Choi, K., Li, C.-J., Yue, Z., Das, R. K., Rusz, J., Magimai Doss, M., Orozco- Arroyave, J. R., Arias-Vergara, T., Maier, A., Noth, E., Mortensen, D. R., Harwath, D., and Perez-Toro, P. A. (2026). Adapting self-supervised speech representations for cross-lingual dysarthria detection in Parkinson’s disease. arXiv:2603.22225. 21

  7. [7]

    D., Rusz, J., and Orozco-Arroyave, J

    Rios-Urrego, C. D., Rusz, J., and Orozco-Arroyave, J. R. (2024). Automatic speech-based assessment to discriminate Parkinson’s disease from essential tremor with a cross-language approach. npj Digital Medicine, 7, 37.doi:10.1038/s41746-024-01027-6

  8. [8]

    Multilingual dysarthric speech assessment us- ing universal phone recognition and language-specific phonemic contrast modeling

    Yeo, E. J., Liss, J. M., Berisha, V ., and Mortensen, D. R. (2026). Multilingual dysarthric speech assess- ment using universal phone recognition and language-specific phonemic contrast modeling. arXiv preprint arXiv:2601.21205.doi:10.48550/arXiv.2601.21205

  9. [9]

    Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self- supervised learning of speech representations. In Advances in Neural Information Processing Systems, 33, 12449–12460

  10. [10]

    Kadirvelu, B., Stumpf, L., Waibel, S., and Faisal, A. A. (2025). Speaker-independent dysarthria sever- ity classification using self-supervised transformers and multi-task learning. PLOS Digital Health, 4(11), e0001076.doi:10.1371/journal.pdig.0001076

  11. [11]

    Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., von Platen, P., Saraf, Y ., Pino, J., Baevski, A., Conneau, A., and Auli, M. (2022). XLS-R: Self-supervised cross-lingual speech representation learning at scale. In Proceedings of Interspeech 2022, 2278–2282.doi:10.21437/Inter speech.2022-143

  12. [12]

    P., Huang, W.-C., and Toda, T

    Violeta, L. P., Huang, W.-C., and Toda, T. (2022). Investigating self-supervised pretraining frameworks for pathological speech recognition. In Proceedings of Interspeech 2022.doi:10.21437/Interspeech.2 022-10043

  13. [13]

    Sapkota, B., Shrestha, S., and Baral, R. (2025). Do all features matter? Layer-wise feature probing of self-supervised speech models for dysarthria severity classification. Speech Communication, 175, 103326. doi:10.1016/j.specom.2025.103326

  14. [14]

    Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

    Bae, J., Zheng, X., Kim, M., Yoo, C. D., and Hasegawa-Johnson, M. (2026). Something from nothing: Data augmentation for robust severity level estimation of dysarthric speech. arXiv:2603.15988

  15. [15]

    R., and Nöth, E

    Javanmardi, F., Arias-Vergara, T., Orozco-Arroyave, J. R., and Nöth, E. (2024). Pre-trained models for detection and severity level classification of dysarthria from speech. Speech Communication, 156, 103047. doi:10.1016/j.specom.2024.103047

  16. [16]

    HuBERT: Self-supervised speech representation learning by masked prediction of hidden units,

    Hsu, W.-N., Bolte, B., Tsai, Y .-H. H., Lakhotia, K., Salakhutdinov, R., and Mohamed, A. (2021). HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transac- tions on Audio, Speech, and Language Processing, 29, 3451–3460.doi:10.1109/TASLP.2021.3122291

  17. [17]

    Macmillan, N. A. and Creelman, C. D. (2005). Detection Theory: A User’s Guide (2nd ed.). Lawrence Erlbaum Associates

  18. [18]

    P., Findlater, L., Lea, C., Herrlinger, S., Korn, P., Abou-Zahra, S., Heywood, R., Tomanek, K., and MacDonald, B

    Hasegawa-Johnson, M., Zheng, X., Kim, H., Mendes, C., Dickinson, M., Hege, E., Zwilling, C., Moore Channell, M., Mattie, L., Hodges, H., Ramig, L., Bellard, M., Shebanek, M., Sari, L., Kalgaonkar, K., Frerichs, D., Bigham, J. P., Findlater, L., Lea, C., Herrlinger, S., Korn, P., Abou-Zahra, S., Heywood, R., Tomanek, K., and MacDonald, B. (2024). Community...

  19. [19]

    Panayotov, V ., Chen, G., Povey, D., and Khudanpur, S. (2015). LibriSpeech: An ASR corpus based on public domain audio books. In Proceedings of ICASSP 2015, 5206–5210.doi:10.1109/ICASSP.2015. 7178964

  20. [20]

    K., and Wolff, T

    Rudzicz, F., Namasivayam, A. K., and Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), 523–541.doi:10.100 7/s10579-011-9145-0

  21. [21]

    Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T., Watkin, K., and Frame, S. (2008). Dysarthric speech database for universal access research. In Proceedings of Interspeech 2008, 1741–1744. doi:10.21437/Interspeech.2008-480

  22. [22]

    Rusko, M., Sabo, R., Trnka, M., Zimmermann, A., Malaschitz, R., Ruzicky, E., Brandoburova, P., Kevicka, V ., and Skorvanek, M. (2024). Slovak database of speech affected by neurodegenerative diseases. Scientific Data, 11, 1320.doi:10.1038/s41597-024-04171-6 22

  23. [23]

    Jesus, L. M. T., Belo, I., Machado, J., and Hall, A. (2017). The advanced voice function assessment databases (A VFAD): Tools for voice clinicians and speech research. In Advances in Speech-language Pathology. IntechOpen.doi:10.5772/intechopen.69643

  24. [24]

    Middag, C., Martens, J.-P., Van Nuffelen, G., and De Bodt, M. (2009). Automated intelligibility assessment of pathological speech using phonological features. EURASIP Journal on Advances in Signal Processing, 2009, 1–9.doi:10.1155/2009/629030

  25. [25]

    Ganzeboom, M., Bakker, M., Beijer, L., Strik, H., and Rietveld, T. (2022). A serious game for speech training in dysarthric speakers with Parkinson’s disease: Exploring therapeutic efficacy and patient sat- isfaction. International Journal of Language and Communication Disorders, 57(5), 1091–1106.doi: 10.1111/1460-6984.12722

  26. [26]

    Ganzeboom, M., Bakker, M., Beijer, L., Rietveld, T., and Strik, H. (2018). Speech training for neurological patients using a serious game. British Journal of Educational Technology, 49(4), 761–774.doi:10.1111/ bjet.12640

  27. [27]

    A., Guerrero-Lopez, A., Luque-Buzo, E., Arias-Londono, J

    Mendes-Laureano, J., Gomez-Garcia, J. A., Guerrero-Lopez, A., Luque-Buzo, E., Arias-Londono, J. D., Grandas-Perez, F. J., and Godino-Llorente, J. I. (2024). NeuroV oz: A Castilian Spanish corpus of parkin- sonian speech. Scientific Data, 11, 1367.doi:10.1038/s41597-024-04186-z

  28. [28]

    R., Arias-Londono, J

    Orozco-Arroyave, J. R., Arias-Londono, J. D., Vargas-Bonilla, J. F., Gonzalez-Rativa, M. C., and Noth, E. (2014). New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In Proceedings of LREC 2014, 342–347

  29. [29]

    Dimauro, G., Di Nicola, V ., Bevilacqua, V ., Caivano, D., and Girardi, F. (2017). Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access, 5, 22199-22208.doi: 10.1109/ACCESS.2017.2762475

  30. [30]

    Turrisi, R., Braccia, A., Emanuele, M., Giulietti, S., Pugliatti, M., Sensi, M., Fadiga, L., and Badino, L. (2021). EasyCall corpus: A dysarthric speech dataset. In Proceedings of Interspeech 2021, 41–45. doi:10.21437/Interspeech.2021-549

  31. [31]

    Gao, M., Chen, H., Du, J., Xu, X., Guo, H., Bu, H., Yang, J., Li, M., and Lee, C.-H. (2024). Enhancing voice wake-up for dysarthria: Mandarin Dysarthria Speech Corpus release and customized system design. In Proceedings of Interspeech 2024.doi:10.21437/Interspeech.2024-879

  32. [32]

    Wan, Y ., Sun, M., Kang, X., Li, J., Guo, P., Gao, M., and Wang, S.-J. (2024). CDSD: Chinese dysarthria speech database. In Proceedings of Interspeech 2024, 4109–4113.doi:10.21437/Interspeech.202 4-1597

  33. [33]

    SLR65: Crowdsourced high-quality Tamil multi-speaker speech dataset [Dataset].http s://www.openslr.org/65/(accessed 2026-03-01)

    OpenSLR (2020). SLR65: Crowdsourced high-quality Tamil multi-speaker speech dataset [Dataset].http s://www.openslr.org/65/(accessed 2026-03-01)

  34. [34]

    and Barry, W

    Puetzer, M. and Barry, W. J. (2007). Saarbruecken V oice Database. Institute of Phonetics, Saarland Uni- versity.http://www.stimmdatenbank.coli.uni-saarland.de/

  35. [35]

    Mihajlik, P., Toth, L., and Nemeth, G. (2023). Hungarian dysarthric speech database [Dataset]. Budapest University of Technology and Economics

  36. [36]

    Kenyan Swahili Dysarthric Speech Corpus [Dataset]

    CDLI (2024). Kenyan Swahili Dysarthric Speech Corpus [Dataset]. Centre for Digital Language Inclusion, University of Cape Town.https://www.cdli.uct.ac.za/(accessed 2026-03-15)

  37. [37]

    L., Palmer, K

    Stipancic, K. L., Palmer, K. M., Rowe, H. P., Yunusova, Y ., Berry, J. D., and Green, J. R. (2021). You say severe, I say mild: Toward an empirical classification of dysarthria severity. Journal of Speech, Language, and Hearing Research, 64(12), 4718–4735.doi:10.1044/2021_JSLHR-21-00197

  38. [38]

    Grosman, J. (2021). Fine-tuned XLSR-53 large models for speech recognition [Model collection]. Hug- gingFace.https://huggingface.co/jonatasgrosman(accessed 2026-04-01)

  39. [39]

    J., Hasegawa-Johnson, M., Jiang, P.-P., Kuila, A., Lea, C., MacDonald, B., Mantena, G., Ravichandran, V ., Sari, L., Tomanek, K., Yoo, C

    Zheng, X., Phukon, B., Na, J., Cutrell, E., Han, K. J., Hasegawa-Johnson, M., Jiang, P.-P., Kuila, A., Lea, C., MacDonald, B., Mantena, G., Ravichandran, V ., Sari, L., Tomanek, K., Yoo, C. D., and Zwilling, C. (2025). The Interspeech 2025 Speech Accessibility Project Challenge. In Proceedings of Interspeech 2025, 3269–3273.doi:10.21437/Interspeech.2025-566

  40. [40]

    V ., Senerchia, G., Salvatore, E., De Pietro, G., De Falco, I., and Sannino, G

    Dubbioso, R., Spisto, M., Verde, L., Iuzzolino, V . V ., Senerchia, G., Salvatore, E., De Pietro, G., De Falco, I., and Sannino, G. (2024). V oice signals database of ALS patients with different dysarthria severity and healthy controls. Scientific Data, 11(1), 800.doi:10.1038/s41597-024-03597-2 23

  41. [41]

    Pratap, V ., Tjandra, A., Shi, B., Tomasello, P., Babu, A., Kundu, S., Elkahky, A., Ni, Z., Vyas, A., Fazel- Zarandi, M., Baevski, A., Adi, Y ., Zhang, X., Hsu, W.-N., Conneau, A., and Auli, M. (2024). Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25(97), 1–52

  42. [42]

    Chen, S., Wang, C., Chen, Z., Wu, Y ., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., Wu, J., Zhou, L., Ren, S., Qian, Y ., Qian, Y ., Wu, J., Zeng, M., Yu, X., and Wei, F. (2022). WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, 16(6), 1505–1518.doi:10.1...

  43. [43]

    Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates

  44. [44]

    The SSNCE Database of Tamil Dysarthric Speech [Dataset]

    LDC (2021). The SSNCE Database of Tamil Dysarthric Speech [Dataset]. Linguistic Data Consortium, LDC2021S04.doi:10.35111/hkh2-vh40

  45. [45]

    McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. In Proceedings of Interspeech 2017, 498–502. doi:10.21437/Interspeech.2017-1386

  46. [46]

    Van Nuffelen, G., Middag, C., De Bodt, M., and Martens, J.-P. (2009). Speech technology-based as- sessment of phoneme intelligibility in dysarthria. International Journal of Language and Communication Disorders, 44(5), 716–730.doi:10.1080/13682820802342062

  47. [47]

    and Yarkoni, T

    Westfall, J. and Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder than you think. PLOS ONE, 11(3), e0152719.doi:10.1371/journal.pone.0152719 24