pith. sign in

arxiv: 2605.15854 · v1 · pith:OV4SSAHPnew · submitted 2026-05-15 · 📡 eess.AS

Improving Automatic Speech Recognition for Speakers Treated for Oral Cancer using Data Augmentation and LLM Error Correction

Pith reviewed 2026-05-19 19:26 UTC · model grok-4.3

classification 📡 eess.AS
keywords automatic speech recognitionoral cancerdata augmentationLLM error correctionWhisper modelMMS modelword error rateDutch speech
0
0 comments X

The pith

Combining data augmentation and LLM error correction cuts word error rates by 40-50% for oral cancer speech recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard speech recognition systems struggle with the variable speech of people treated for oral cancer due to limited training data. By generating synthetic speech through augmentation techniques like text-to-speech and then using large language models to fix recognition errors, the authors improve performance on Dutch oral cancer speech datasets. This combination leads to substantial reductions in errors for both Whisper and MMS models. A sympathetic reader would care because better ASR could help these patients communicate more easily with technology. The work demonstrates a practical way to adapt existing models without needing massive new real-world recordings.

Core claim

The authors apply various data augmentation techniques to a corpus of Dutch oral cancer speech to create synthetic data and finetune Whisper and MMS models, achieving an average 8% relative WER decrease with TTS augmentation. Employing LLMs for error correction provides an additional 21.4-26.2% relative decrease for finetuned models, resulting in overall 40% and 50% relative WER decreases for Whisper and MMS respectively.

What carries the argument

Data augmentation techniques, particularly text-to-speech synthesis, combined with large language model-based error correction to improve ASR performance on impaired speech.

If this is right

  • Finetuning ASR models on augmented OC speech data reduces WER by about 8% on average.
  • LLM error correction further decreases WER by 21-26% for finetuned models and 10% for non-finetuned ones.
  • The combined approach achieves 40% relative improvement for Whisper and 50% for MMS on OC speech.
  • This strategy is viable for recognizing speech from patients treated for oral cancer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar augmentation and correction methods could be tested on other speech impairments like those from stroke or Parkinson's disease.
  • Integrating these techniques into real-time ASR applications might improve accessibility for medical patients in daily use.
  • The success suggests that synthetic data can bridge gaps in medical speech datasets where real recordings are hard to obtain.

Load-bearing premise

The synthetic speech samples must match the real speech variations of oral cancer patients closely enough, and the language model fixes must not change any medically important information.

What would settle it

Testing the improved models on a new set of real oral cancer speech recordings and finding no reduction in word error rates compared to baseline would disprove the effectiveness of the augmentation and correction pipeline.

read the original abstract

In recent years, the performance of automatic speech recognition (ASR) systems has made considerable progress. Unfortunately, for people with speech impairments, such as people treated for oral cancer (OC), ASR performance is still lagging behind. The scarcity and variability of OC speech data makes development of ASR models for this type of speech difficult. In this work, we use data augmentation and large language model (LLM) error correction to mitigate this problem. We apply various augmentation techniques on a corpus of Dutch oral cancer speech to create synthetic data, and evaluate their effect on ASR performance. We finetune Whisper and Massively Multilingual Speech (MMS) models for each augmentation technique and observe, on average, an 8% relative decrease in Word Error Rate (WER) when including data created using text-to-speech (TTS). When employing LLMs for error correction, we see a further 21.4-26.2% relative decrease in WER for finetuned ASR models and a 10.0% relative decrease for non-finetuned models. Overall, we achieve a 40% relative WER decrease for Whisper and a 50% relative WER decrease for MMS, indicating that a combination of data augmentation and LLM correction is a viable strategy for the recognition of OC speech.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript describes the use of data augmentation (including TTS synthesis on Dutch OC speech transcripts) to generate synthetic training data for fine-tuning Whisper and MMS ASR models, followed by LLM-based error correction on the ASR outputs. It reports an average 8% relative WER reduction from TTS augmentation, additional 21.4-26.2% relative reductions from LLM correction on fine-tuned models, and overall relative WER decreases of 40% for Whisper and 50% for MMS, concluding that the combination is a viable strategy for OC speech recognition.

Significance. If the reported relative gains are confirmed with absolute WER values, proper baselines, statistical tests, and evidence that synthetic data matches real OC acoustic characteristics, the work would demonstrate a practical, low-resource approach to improving ASR for a clinically important impaired-speech population. The empirical focus on a real patient corpus and the combination of augmentation with LLM post-processing are strengths that could inform follow-on studies, though the current evidence leaves the magnitude and generalizability of the gains only partially supported.

major comments (3)
  1. [Abstract] Abstract: the headline claims of 40% and 50% relative WER reductions are presented without any absolute baseline WER values, number of speakers or utterances in the test set, dataset sizes, or statistical significance tests, preventing assessment of whether the improvements are practically meaningful or robust.
  2. [Data Augmentation] Data augmentation and evaluation protocol: the central assumption that TTS-generated synthetic data sufficiently captures post-treatment articulatory distortions (altered formants, hypernasality, consonant distortions) is not supported by any acoustic analysis or direct comparison of synthetic versus real OC recordings; if the test set consists of real patient speech, observed WER drops may reflect domain mismatch rather than improved robustness.
  3. [LLM Error Correction] LLM error correction: the additional WER reductions from LLM post-processing are reported without any domain-specific validation or error analysis confirming that medically relevant terminology and clinical intent are preserved; this is a load-bearing assumption for the claim that the pipeline is viable for OC speech.
minor comments (1)
  1. [Abstract] The description of the 8% average relative decrease from TTS augmentation does not specify how the average is computed across the different augmentation techniques and the two base models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We have addressed each major comment in detail below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claims of 40% and 50% relative WER reductions are presented without any absolute baseline WER values, number of speakers or utterances in the test set, dataset sizes, or statistical significance tests, preventing assessment of whether the improvements are practically meaningful or robust.

    Authors: We agree that the abstract would benefit from additional context to allow readers to better evaluate the practical significance of the results. In the revised manuscript, we have updated the abstract to report the absolute baseline WER values for the models, the number of speakers and utterances in the test set, the relevant dataset sizes, and a note on the statistical significance of the observed improvements. revision: yes

  2. Referee: [Data Augmentation] Data augmentation and evaluation protocol: the central assumption that TTS-generated synthetic data sufficiently captures post-treatment articulatory distortions (altered formants, hypernasality, consonant distortions) is not supported by any acoustic analysis or direct comparison of synthetic versus real OC recordings; if the test set consists of real patient speech, observed WER drops may reflect domain mismatch rather than improved robustness.

    Authors: We thank the referee for this observation. Our TTS augmentation is based on transcripts from the OC corpus to increase exposure to domain-specific lexical content and sentence structures rather than to synthesize the precise acoustic distortions of impaired speech. The test set consists of real patient recordings, and the reported gains are empirical. We have revised the manuscript to include an explicit discussion of this limitation of the augmentation approach and its implications for interpreting the source of the WER reductions. revision: partial

  3. Referee: [LLM Error Correction] LLM error correction: the additional WER reductions from LLM post-processing are reported without any domain-specific validation or error analysis confirming that medically relevant terminology and clinical intent are preserved; this is a load-bearing assumption for the claim that the pipeline is viable for OC speech.

    Authors: We agree that domain-specific validation is important for this component. In the revised manuscript, we have added an error analysis of the LLM corrections that examines preservation of medically relevant terminology and clinical intent, including representative examples and a summary of the types of changes made by the LLM. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ASR finetuning and evaluation

full rationale

The manuscript presents an experimental pipeline: data augmentation via TTS and other techniques on Dutch OC transcripts, finetuning of Whisper and MMS models, followed by LLM-based error correction, with WER measured on held-out real OC recordings. No equations, uniqueness theorems, or self-citations are invoked to derive the reported 40-50% relative WER reductions; the gains are direct empirical outcomes of the described training and inference steps. The central claims rest on standard ML evaluation protocols rather than any reduction to author-defined parameters or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions about data augmentation and post-processing rather than new free parameters or invented entities.

axioms (1)
  • domain assumption Synthetic speech generated by TTS can usefully supplement scarce real recordings of oral cancer speech for model training.
    This premise underpins the data-augmentation experiments described in the abstract.

pith-pipeline@v0.9.0 · 5791 in / 1274 out tokens · 60375 ms · 2026-05-19T19:26:02.230336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,

    K. D. Shield, J. Ferlay, A. Jemal, R. Sankaranarayanan, A. K. Chaturvedi, F. Bray, and I. Soerjomataram, “The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,”CA: a cancer journal for clinicians, vol. 67, no. 1, pp. 51–64, 2017

  2. [2]

    Cancer statistics for the year 2020: An overview,

    J. Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Pi ˜neros, A. Znaor, and F. Bray, “Cancer statistics for the year 2020: An overview,”International Journal of Cancer, vol. 149, no. 4, pp. 778–789, 2021. [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/ijc.33588

  3. [3]

    Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,

    G. Constantinescu and J. M. Rieger, “Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,” inClinical Care and Rehabilitation in Head and Neck Cancer, P. C. Doyle, Ed. Springer International Publishing, 2019, pp. 265–279. [Online]. Available: https://doi.org/10.1007/978-3-030-04702-3 16

  4. [4]

    Speech Disorders Related to Head and Neck Cancer,

    T. Bressmann, “Speech Disorders Related to Head and Neck Cancer,” inThe Handbook of Language and Speech Disorders. John Wiley & Sons, Ltd, 2021, pp. 495–527. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119606987.ch22

  5. [5]

    Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,

    T. B. Tienkamp, T. Rebernik, R. A. D’Cruz, R. van Son, M. Wieling, M. J. H. Witjes, S. de Visscher, and D. Abur, “Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,”International Journal of Language & Communication Disorders, vol. 60, no. 1, p. e13148, 2025. [Online]. Available: htt...

  6. [6]

    Transformers in speech processing: Overcoming challenges and paving the future,

    S. Latif, S. A. M. Zaidi, H. Cuay ´ahuitl, F. Shamshad, M. Shoukat, M. Usama, and J. Qadir, “Transformers in speech processing: Overcoming challenges and paving the future,”Computer Science Review, vol. 58, p. 100768, 2025. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1574013725000449

  7. [7]

    Low-resource automatic speech recognition and error analyses of oral cancer speech,

    B. M. Halpern, S. Feng, R. Van Son, M. Van Den Brekel, and O. Scharenborg, “Low-resource automatic speech recognition and error analyses of oral cancer speech,”Speech Communication, vol. 141, pp. 14–27, 2022-06. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/S0167639322000620

  8. [8]

    Automatic speech recognition and error analyses of Dutch oral cancer speech,

    K. Wildenburg, “Automatic speech recognition and error analyses of Dutch oral cancer speech,” Master’s thesis, University of Groningen,

  9. [9]

    Available: https://campus-fryslan.studenttheses.ub.rug

    [Online]. Available: https://campus-fryslan.studenttheses.ub.rug. nl/224/

  10. [10]

    Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,

    S. A. Sheikh, M. Sahidullah, and I. Kodrasi, “Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,”IEEE Journal of Selected Topics in Signal Processing, vol. 19, no. 5, pp. 700–716, 2025

  11. [11]

    A survey of technologies for automatic Dysarthric speech recognition,

    Z. Qian, K. Xiao, and C. Yu, “A survey of technologies for automatic Dysarthric speech recognition,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, no. 1, p. 48, 2023-11-

  12. [12]

    Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

    [Online]. Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

  13. [13]

    Audio augmentation for speech recognition,

    T. Ko, V . Peddinti, D. Povey, and S. Khudanpur, “Audio augmentation for speech recognition,” inProc. Interspeech 2015. ISCA, 2015-09- 06, pp. 3586–3589. [Online]. Available: https://www.isca-archive.org/ interspeech 2015/ko15 interspeech.html

  14. [14]

    V ocal Tract Length Perturbation (VTLP) improves speech recognition,

    N. Jaitly and G. E. Hinton, “V ocal Tract Length Perturbation (VTLP) improves speech recognition,”ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013

  15. [15]

    Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,

    B. Vachhani, C. Bhat, and S. K. Kopparapu, “Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,” in Proc. Interspeech 2018. ISCA, 2018-09-02, pp. 471–475. [Online]. Available: https://www.isca-archive.org/interspeech 2018/vachhani18 interspeech.html

  16. [16]

    Investigation of Data Augmentation Techniques for Disordered Speech Recognition,

    M. Geng, X. Xie, S. Liu, J. Yu, S. Hu, X. Liu, and H. Meng, “Investigation of Data Augmentation Techniques for Disordered Speech Recognition,” inProc. Interspeech 2020, 2020, pp. 696–

  17. [17]

    Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

    [Online]. Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

  18. [18]

    Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,

    R. Gracelli and J. Almeida, “Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,” inProc. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 2024-06, pp. 243–248. [Online]. Available: https://ieeexplore.ieee.org/document/10600718/

  19. [19]

    Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,

    S. A. Naeini, L. Simmatis, D. Jafari, Y . Yunusova, and B. Taati, “Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,”IEEE Journal of Translational Engineering in Health and Medicine, vol. 12, pp. 382–389, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10464345/

  20. [20]

    Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,

    K. El Hajal, E. Hermann, S. Hovsepyan, and M. M. Doss, “Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,” inProc. Interspeech 2025, 2025, pp. 2760–

  21. [21]

    Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

    [Online]. Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

  22. [22]

    Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,

    C.-J. Li, E. Yeo, K. Choi, P. A. P ´erez-Toro, M. Someki, R. K. Das, Z. Yue, J. R. Orozco-Arroyave, E. N ¨oth, and D. R. Mortensen, “Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,” in Proc. Interspeech 2025, 2025, pp. 2128–2132. [Online]. Available: https://www.isca-archive.org/interspee...

  23. [23]

    Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,

    M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,” inProc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022-05-23, pp. 7382–7386. [Online]. Available: https:// ieeexplore.ieee.org/document/9746585/

  24. [24]

    Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,

    E. Hermann and M. Magimai. Doss, “Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,” inProc. Interspeech 2023, 2023, pp. 156–160. [Online]. Available: https: //www.isca-archive.org/interspeech 2023/hermann23 interspeech.html

  25. [25]

    Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,

    W.-Z. Leung, M. Cross, A. Ragni, and S. Goetze, “Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,” inProc. Interspeech 2024, 2024, pp. 2494–2498. [Online]. Available: https://www.isca-archive. org/interspeech 2024/leung24 interspeech.html

  26. [26]

    Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,

    B. M. Halpern, W.-C. Huang, L. P. Violeta, R. van Son, and T. Toda, “Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,” inProc. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–7. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/ 10389707

  27. [27]

    The design for the wall street journal-based csr corpus,

    D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus,” inProc. HLT ’91 Workshop on Speech and Natural Language. USA: Association for Computational Linguistics, 1992, p. 357–362. [Online]. Available: https://doi.org/10.3115/1075527.1075614

  28. [28]

    Het Corpus Gesproken Nederlands,

    N. Oostdijk, “Het Corpus Gesproken Nederlands,” 1999. [Online]. Available: https://hdl.handle.net/2066/76350

  29. [29]

    Manipulation of oral cancer speech using neural articulatory synthesis,

    B. M. Halpern, T. Rebernik, T. Tienkamp, R. van Son, M. van den Brekel, M. Wieling, M. Witjes, and O. Scharenborg, “Manipulation of oral cancer speech using neural articulatory synthesis,” 2022-03-31, pre-published. [Online]. Available: http://arxiv.org/abs/2203.17072

  30. [30]

    Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,

    Y . He, K. P. Seng, C. S. Lim, and L. M. Ang, “Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,”Advanced Intelligent Systems, p. e202500873, 2025-10. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202500873

  31. [31]

    Robust Speech Recognition via Large-Scale Weak Super- vision,

    A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Super- vision,” inProc. 2023 International Conference on Machine Learning (ICML), 2022-06-12

  32. [32]

    Scaling Speech Technology to 1,000+ Languages,

    V . Pratap, A. Tjandra, B. Shi, P. Tomasello, A. Babu, S. Kundu, A. Elkahky, Z. Ni, A. Vyas, M. Fazel-Zarandiet al., “Scaling Speech Technology to 1,000+ Languages,”Journal of Machine Learning Research, vol. 25, no. 97, pp. 1–52, 2024. [Online]. Available: https://jmlr.org/papers/volume25/23-1318/23-1318.pdf

  33. [33]

    LoRA: Low-rank adaptation of large language models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=nZeVKeeFYf9

  34. [34]

    Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,

    C. Bhat, A. Panda, and H. Strik, “Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,” inProc. Interspeech 2022. ISCA, 2022-09-18, pp. 46–50. [Online]. Available: https://www.isca-archive.org/interspeech 2022/bhat22 interspeech.html

  35. [35]

    V oice Conversion With Just Nearest Neighbors,

    M. Baas, B. van Niekerk, and H. Kamper, “V oice Conversion With Just Nearest Neighbors,” inProc. Interspeech 2023, 2023-05-

  36. [36]

    Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

    [Online]. Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

  37. [37]

    WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,

    S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y . Qian, Y . Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022-06-17. [Onlin...

  38. [38]

    XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,

    E. Casanova, K. Davis, E. G ¨olge, G. G ¨oknar, I. Gulea, L. Hart, A. Aljafari, J. Meyer, R. Morais, S. Olayemi, and J. Weber, “XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,” inProc. Interspeech 2024, 2024-06-07. [Online]. Available: https: //www.isca-archive.org/interspeech 2024/casanova24 interspeech.html

  39. [39]

    Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,

    V . Vandeghinste, B. Bult ´e, and L. Augustinus, “Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,”Proceedings of CLARIN Annual Conference 2019, pp. 188–191, 2019-10-01

  40. [40]

    Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,

    Z. Yue, F. Xiong, H. Christensen, and J. Barker, “Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,” inProc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020-05, pp. 6094–6098. [Online]. Available: https://ieeexplore.ieee. org/document/9054343/

  41. [41]

    The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,

    L. Prananta, B. Halpern, S. Feng, and O. Scharenborg, “The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,” inProc. Interspeech

  42. [42]

    ISCA, 2022-09-18, pp. 36–40. [Online]. Available: https: //www.isca-archive.org/interspeech 2022/prananta22 interspeech.html

  43. [43]

    On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,

    G. Schu, P. Janbakhshi, and I. Kodrasi, “On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,” inProc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10095981/

  44. [44]

    Accurate synthesis of dysarthric Speech for ASR data augmentation,

    M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Accurate synthesis of dysarthric Speech for ASR data augmentation,”Speech Communication, vol. 164, p. 103112, 2024-10-

  45. [45]

    Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

    [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

  46. [46]

    Nasaliteitsmeting met de nasometer,

    J. van de Weijer and I. Slis, “Nasaliteitsmeting met de nasometer,” Logopedie & F oniatrie, vol. 63, pp. 97–101, 1991. [Online]. Available: https://hdl.handle.net/2066/323177

  47. [47]

    De ontwikkeling van een fonetisch gebalanceerde standaardtekst,

    H. Martens, G. Nuffelen, and M. Bodt, “De ontwikkeling van een fonetisch gebalanceerde standaardtekst,”Logopedie, vol. 23, pp. 31–36, 01 2010

  48. [48]

    The IFA corpus: a phonemically segmented Dutch

    R. van Son, D. Binnenpoorte, H. Heuvel, and L. Pols, “The IFA corpus: a phonemically segmented Dutch ”open source” speech database,” in Proc. Eurospeech 2001, 2001

  49. [49]

    librosa: Audio and music signal analysis in Python

    B. McFee, C. Raffel, D. Liang, D. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: Audio and Music Signal Analysis in Python,” inProc. Python in Science Conference, 2015, pp. 18–24. [Online]. Available: https://doi.org/10.25080/Majora-7b98e3ed-003

  50. [50]

    TorchAudio: Building Blocks for Audio and Speech Processing,

    Y .-Y . Yang, M. Hira, Z. Ni, A. Chourdia, A. Astafurov, C. Chen, C.-F. Yeh, C. Puhrsch, D. Pollack, D. Genzel, D. Greenberg, E. Z. Yang, J. Lian, J. Mahadeokar, J. Hwang, J. Chen, P. Goldsborough, P. Roy, S. Narenthiran, S. Watanabe, S. Chintala, V . Quenneville- B´elair, and Y . Shi, “TorchAudio: Building Blocks for Audio and Speech Processing,” inProc....

  51. [51]

    NLP augmentation,

    E. Ma, “NLP augmentation,” 2019. [Online]. Available: https: //github.com/makcedward/nlpaug

  52. [52]

    Common voice: A massively-multilingual speech corpus,

    R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” inProceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, 2020-05, pp. 4218–4222. [Online]. Available: https://aclanthology.org/...

  53. [53]

    Wav2vec 2.0: A framework for self-supervised learning of speech representations,

    A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 12 449– 12 460, 2020

  54. [54]

    PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,

    S. Mangrulkar, S. Gugger, L. Debut, Y . Belkada, S. Paul, and B. Bossan, “PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,”

  55. [55]

    Available: https://github.com/huggingface/peft

    [Online]. Available: https://github.com/huggingface/peft

  56. [56]

    Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,

    B. M. Halpern, T. Tienkamp, D. Abur, and T. Toda, “Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,” 2025, unpublished

  57. [57]

    A Survey on LoRA of Large Language Models,

    Y . Mao, Y . Ge, Y . Fan, W. Xu, Y . Mi, Z. Hu, and Y . Gao, “A Survey on LoRA of Large Language Models,”Frontiers of Computer Science, vol. 19, no. 7, p. 197605, 2024-12-14. [Online]. Available: https://doi.org/10.1007/s11704-024-40663-9

  58. [58]

    LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,

    Z. Song, J. Zhuo, Y . Yang, Z. Ma, S. Zhang, and X. Chen, “LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,” in Proc. Interspeech 2024, 2024. [Online]. Available: https://www. isca-archive.org/interspeech 2024/song24 interspeech.html

  59. [59]

    Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,

    P. Gabler, B. C. Geiger, B. Schuppler, and R. Kern, “Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,”Information, vol. 14, no. 2, p. 137, 2023-02. [Online]. Available: https: //www.mdpi.com/2078-2489/14/2/137

  60. [60]

    Impact of Speech Mode in Automatic Pathological Speech Detection,

    S. A. Sheikh and I. Kodrasi, “Impact of Speech Mode in Automatic Pathological Speech Detection,” inProc. 2024 European Signal Processing Conference (EUSIPCO), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10714947/

  61. [61]

    Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,

    J. Tobin, P. Nelson, B. MacDonald, R. Heywood, R. Cave, K. Seaver, A. Desjardins, P.-P. Jiang, and J. R. Green, “Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,”Journal of Speech, Language, and Hearing Research, vol. 67, no. 11, pp. 4176–4185, 2024-11-07. [Online]. Available: https://pubs.asha.org/doi/10.1044/2...

  62. [62]

    Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,

    Y .-L. Liu, R. Feng, J.-H. Yuan, and Z.-H. Ling, “Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,” inProc. Interspeech 2024, 2024, pp. 2435–2439. [Online]. Available: https://www.isca-archive.org/interspeech 2024/liu24f interspeech.html