Improving Automatic Speech Recognition for Speakers Treated for Oral Cancer using Data Augmentation and LLM Error Correction

Bence Mark Halpern; Hidde Folkertsma; Jiapan Guo; Max Witjes; Rob van Son; Sebastiaan de Visscher; Thomas Tienkamp

arxiv: 2605.15854 · v1 · pith:OV4SSAHPnew · submitted 2026-05-15 · 📡 eess.AS

Improving Automatic Speech Recognition for Speakers Treated for Oral Cancer using Data Augmentation and LLM Error Correction

Hidde Folkertsma , Thomas Tienkamp , Sebastiaan de Visscher , Max Witjes , Rob van Son , Jiapan Guo , Bence Mark Halpern This is my paper

Pith reviewed 2026-05-19 19:26 UTC · model grok-4.3

classification 📡 eess.AS

keywords automatic speech recognitionoral cancerdata augmentationLLM error correctionWhisper modelMMS modelword error rateDutch speech

0 comments

The pith

Combining data augmentation and LLM error correction cuts word error rates by 40-50% for oral cancer speech recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard speech recognition systems struggle with the variable speech of people treated for oral cancer due to limited training data. By generating synthetic speech through augmentation techniques like text-to-speech and then using large language models to fix recognition errors, the authors improve performance on Dutch oral cancer speech datasets. This combination leads to substantial reductions in errors for both Whisper and MMS models. A sympathetic reader would care because better ASR could help these patients communicate more easily with technology. The work demonstrates a practical way to adapt existing models without needing massive new real-world recordings.

Core claim

The authors apply various data augmentation techniques to a corpus of Dutch oral cancer speech to create synthetic data and finetune Whisper and MMS models, achieving an average 8% relative WER decrease with TTS augmentation. Employing LLMs for error correction provides an additional 21.4-26.2% relative decrease for finetuned models, resulting in overall 40% and 50% relative WER decreases for Whisper and MMS respectively.

What carries the argument

Data augmentation techniques, particularly text-to-speech synthesis, combined with large language model-based error correction to improve ASR performance on impaired speech.

If this is right

Finetuning ASR models on augmented OC speech data reduces WER by about 8% on average.
LLM error correction further decreases WER by 21-26% for finetuned models and 10% for non-finetuned ones.
The combined approach achieves 40% relative improvement for Whisper and 50% for MMS on OC speech.
This strategy is viable for recognizing speech from patients treated for oral cancer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar augmentation and correction methods could be tested on other speech impairments like those from stroke or Parkinson's disease.
Integrating these techniques into real-time ASR applications might improve accessibility for medical patients in daily use.
The success suggests that synthetic data can bridge gaps in medical speech datasets where real recordings are hard to obtain.

Load-bearing premise

The synthetic speech samples must match the real speech variations of oral cancer patients closely enough, and the language model fixes must not change any medically important information.

What would settle it

Testing the improved models on a new set of real oral cancer speech recordings and finding no reduction in word error rates compared to baseline would disprove the effectiveness of the augmentation and correction pipeline.

read the original abstract

In recent years, the performance of automatic speech recognition (ASR) systems has made considerable progress. Unfortunately, for people with speech impairments, such as people treated for oral cancer (OC), ASR performance is still lagging behind. The scarcity and variability of OC speech data makes development of ASR models for this type of speech difficult. In this work, we use data augmentation and large language model (LLM) error correction to mitigate this problem. We apply various augmentation techniques on a corpus of Dutch oral cancer speech to create synthetic data, and evaluate their effect on ASR performance. We finetune Whisper and Massively Multilingual Speech (MMS) models for each augmentation technique and observe, on average, an 8% relative decrease in Word Error Rate (WER) when including data created using text-to-speech (TTS). When employing LLMs for error correction, we see a further 21.4-26.2% relative decrease in WER for finetuned ASR models and a 10.0% relative decrease for non-finetuned models. Overall, we achieve a 40% relative WER decrease for Whisper and a 50% relative WER decrease for MMS, indicating that a combination of data augmentation and LLM correction is a viable strategy for the recognition of OC speech.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets 40-50% relative WER drops on Dutch oral cancer speech from TTS augmentation plus LLM correction, but the synthetic data likely misses the real acoustic distortions that define post-treatment speech.

read the letter

The main thing to know is that this paper gets a 40% relative WER reduction for Whisper and 50% for MMS on Dutch oral cancer speech by combining TTS-based data augmentation with LLM error correction. The second point is that the gains come mostly from the LLM step after some finetuning. They start with a corpus of real OC speech and generate synthetic data using various augmentation methods, including TTS. Finetuning the models on this mix gives an average 8% relative WER drop. Then applying LLMs to correct the ASR outputs adds another 21-26% relative improvement for the finetuned versions. The overall numbers look decent for an applied task where data is scarce. What they do well is focus on a medically relevant group. People treated for oral cancer often have persistent speech changes that make standard ASR unreliable, and this work tries to bridge that gap with tools that are already available. The pipeline is straightforward and could be replicated by others working on similar impaired speech problems. The soft spots are worth noting. The results are all relative, with no absolute WER figures or details on the original error rates, which makes it difficult to see the practical impact. There is also no discussion of statistical significance or exact dataset sizes in the provided summary. On the augmentation itself, the concern about TTS not capturing the specific distortions like hypernasality or formant shifts after treatment seems reasonable. If the synthetic data sounds too normal, finetuning might improve performance on clean speech but not necessarily on the real patient test set. The LLM correction step would benefit from some check that it preserves clinical content accurately. This paper is aimed at researchers in automatic speech recognition who deal with clinical or impaired speech data. Anyone building assistive tech for healthcare would find the approach interesting, especially the combination of augmentation and post-correction. It deserves a serious referee because the topic is important and the empirical results, while preliminary, point to a workable strategy that others could build on. I would recommend sending it for peer review with the expectation that reviewers will ask for more on the data characteristics and error analysis.

Referee Report

3 major / 1 minor

Summary. The manuscript describes the use of data augmentation (including TTS synthesis on Dutch OC speech transcripts) to generate synthetic training data for fine-tuning Whisper and MMS ASR models, followed by LLM-based error correction on the ASR outputs. It reports an average 8% relative WER reduction from TTS augmentation, additional 21.4-26.2% relative reductions from LLM correction on fine-tuned models, and overall relative WER decreases of 40% for Whisper and 50% for MMS, concluding that the combination is a viable strategy for OC speech recognition.

Significance. If the reported relative gains are confirmed with absolute WER values, proper baselines, statistical tests, and evidence that synthetic data matches real OC acoustic characteristics, the work would demonstrate a practical, low-resource approach to improving ASR for a clinically important impaired-speech population. The empirical focus on a real patient corpus and the combination of augmentation with LLM post-processing are strengths that could inform follow-on studies, though the current evidence leaves the magnitude and generalizability of the gains only partially supported.

major comments (3)

[Abstract] Abstract: the headline claims of 40% and 50% relative WER reductions are presented without any absolute baseline WER values, number of speakers or utterances in the test set, dataset sizes, or statistical significance tests, preventing assessment of whether the improvements are practically meaningful or robust.
[Data Augmentation] Data augmentation and evaluation protocol: the central assumption that TTS-generated synthetic data sufficiently captures post-treatment articulatory distortions (altered formants, hypernasality, consonant distortions) is not supported by any acoustic analysis or direct comparison of synthetic versus real OC recordings; if the test set consists of real patient speech, observed WER drops may reflect domain mismatch rather than improved robustness.
[LLM Error Correction] LLM error correction: the additional WER reductions from LLM post-processing are reported without any domain-specific validation or error analysis confirming that medically relevant terminology and clinical intent are preserved; this is a load-bearing assumption for the claim that the pipeline is viable for OC speech.

minor comments (1)

[Abstract] The description of the 8% average relative decrease from TTS augmentation does not specify how the average is computed across the different augmentation techniques and the two base models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We have addressed each major comment in detail below.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims of 40% and 50% relative WER reductions are presented without any absolute baseline WER values, number of speakers or utterances in the test set, dataset sizes, or statistical significance tests, preventing assessment of whether the improvements are practically meaningful or robust.

Authors: We agree that the abstract would benefit from additional context to allow readers to better evaluate the practical significance of the results. In the revised manuscript, we have updated the abstract to report the absolute baseline WER values for the models, the number of speakers and utterances in the test set, the relevant dataset sizes, and a note on the statistical significance of the observed improvements. revision: yes
Referee: [Data Augmentation] Data augmentation and evaluation protocol: the central assumption that TTS-generated synthetic data sufficiently captures post-treatment articulatory distortions (altered formants, hypernasality, consonant distortions) is not supported by any acoustic analysis or direct comparison of synthetic versus real OC recordings; if the test set consists of real patient speech, observed WER drops may reflect domain mismatch rather than improved robustness.

Authors: We thank the referee for this observation. Our TTS augmentation is based on transcripts from the OC corpus to increase exposure to domain-specific lexical content and sentence structures rather than to synthesize the precise acoustic distortions of impaired speech. The test set consists of real patient recordings, and the reported gains are empirical. We have revised the manuscript to include an explicit discussion of this limitation of the augmentation approach and its implications for interpreting the source of the WER reductions. revision: partial
Referee: [LLM Error Correction] LLM error correction: the additional WER reductions from LLM post-processing are reported without any domain-specific validation or error analysis confirming that medically relevant terminology and clinical intent are preserved; this is a load-bearing assumption for the claim that the pipeline is viable for OC speech.

Authors: We agree that domain-specific validation is important for this component. In the revised manuscript, we have added an error analysis of the LLM corrections that examines preservation of medically relevant terminology and clinical intent, including representative examples and a summary of the types of changes made by the LLM. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ASR finetuning and evaluation

full rationale

The manuscript presents an experimental pipeline: data augmentation via TTS and other techniques on Dutch OC transcripts, finetuning of Whisper and MMS models, followed by LLM-based error correction, with WER measured on held-out real OC recordings. No equations, uniqueness theorems, or self-citations are invoked to derive the reported 40-50% relative WER reductions; the gains are direct empirical outcomes of the described training and inference steps. The central claims rest on standard ML evaluation protocols rather than any reduction to author-defined parameters or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions about data augmentation and post-processing rather than new free parameters or invented entities.

axioms (1)

domain assumption Synthetic speech generated by TTS can usefully supplement scarce real recordings of oral cancer speech for model training.
This premise underpins the data-augmentation experiments described in the abstract.

pith-pipeline@v0.9.0 · 5791 in / 1274 out tokens · 60375 ms · 2026-05-19T19:26:02.230336+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,

K. D. Shield, J. Ferlay, A. Jemal, R. Sankaranarayanan, A. K. Chaturvedi, F. Bray, and I. Soerjomataram, “The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,”CA: a cancer journal for clinicians, vol. 67, no. 1, pp. 51–64, 2017

work page 2012
[2]

Cancer statistics for the year 2020: An overview,

J. Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Pi ˜neros, A. Znaor, and F. Bray, “Cancer statistics for the year 2020: An overview,”International Journal of Cancer, vol. 149, no. 4, pp. 778–789, 2021. [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/ijc.33588

work page doi:10.1002/ijc.33588 2020
[3]

Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,

G. Constantinescu and J. M. Rieger, “Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,” inClinical Care and Rehabilitation in Head and Neck Cancer, P. C. Doyle, Ed. Springer International Publishing, 2019, pp. 265–279. [Online]. Available: https://doi.org/10.1007/978-3-030-04702-3 16

work page doi:10.1007/978-3-030-04702-3 2019
[4]

Speech Disorders Related to Head and Neck Cancer,

T. Bressmann, “Speech Disorders Related to Head and Neck Cancer,” inThe Handbook of Language and Speech Disorders. John Wiley & Sons, Ltd, 2021, pp. 495–527. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119606987.ch22

work page doi:10.1002/9781119606987.ch22 2021
[5]

Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,

T. B. Tienkamp, T. Rebernik, R. A. D’Cruz, R. van Son, M. Wieling, M. J. H. Witjes, S. de Visscher, and D. Abur, “Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,”International Journal of Language & Communication Disorders, vol. 60, no. 1, p. e13148, 2025. [Online]. Available: htt...

work page doi:10.1111/1460-6984.13148 2025
[6]

Transformers in speech processing: Overcoming challenges and paving the future,

S. Latif, S. A. M. Zaidi, H. Cuay ´ahuitl, F. Shamshad, M. Shoukat, M. Usama, and J. Qadir, “Transformers in speech processing: Overcoming challenges and paving the future,”Computer Science Review, vol. 58, p. 100768, 2025. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1574013725000449

work page 2025
[7]

Low-resource automatic speech recognition and error analyses of oral cancer speech,

B. M. Halpern, S. Feng, R. Van Son, M. Van Den Brekel, and O. Scharenborg, “Low-resource automatic speech recognition and error analyses of oral cancer speech,”Speech Communication, vol. 141, pp. 14–27, 2022-06. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/S0167639322000620

work page 2022
[8]

Automatic speech recognition and error analyses of Dutch oral cancer speech,

K. Wildenburg, “Automatic speech recognition and error analyses of Dutch oral cancer speech,” Master’s thesis, University of Groningen,

work page
[9]

Available: https://campus-fryslan.studenttheses.ub.rug

[Online]. Available: https://campus-fryslan.studenttheses.ub.rug. nl/224/

work page
[10]

Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,

S. A. Sheikh, M. Sahidullah, and I. Kodrasi, “Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,”IEEE Journal of Selected Topics in Signal Processing, vol. 19, no. 5, pp. 700–716, 2025

work page 2025
[11]

A survey of technologies for automatic Dysarthric speech recognition,

Z. Qian, K. Xiao, and C. Yu, “A survey of technologies for automatic Dysarthric speech recognition,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, no. 1, p. 48, 2023-11-

work page 2023
[12]

Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

[Online]. Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

work page doi:10.1186/s13636-023-00318-2
[13]

Audio augmentation for speech recognition,

T. Ko, V . Peddinti, D. Povey, and S. Khudanpur, “Audio augmentation for speech recognition,” inProc. Interspeech 2015. ISCA, 2015-09- 06, pp. 3586–3589. [Online]. Available: https://www.isca-archive.org/ interspeech 2015/ko15 interspeech.html

work page 2015
[14]

V ocal Tract Length Perturbation (VTLP) improves speech recognition,

N. Jaitly and G. E. Hinton, “V ocal Tract Length Perturbation (VTLP) improves speech recognition,”ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013

work page 2013
[15]

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,

B. Vachhani, C. Bhat, and S. K. Kopparapu, “Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,” in Proc. Interspeech 2018. ISCA, 2018-09-02, pp. 471–475. [Online]. Available: https://www.isca-archive.org/interspeech 2018/vachhani18 interspeech.html

work page 2018
[16]

Investigation of Data Augmentation Techniques for Disordered Speech Recognition,

M. Geng, X. Xie, S. Liu, J. Yu, S. Hu, X. Liu, and H. Meng, “Investigation of Data Augmentation Techniques for Disordered Speech Recognition,” inProc. Interspeech 2020, 2020, pp. 696–

work page 2020
[17]

Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

[Online]. Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

work page 2020
[18]

Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,

R. Gracelli and J. Almeida, “Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,” inProc. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 2024-06, pp. 243–248. [Online]. Available: https://ieeexplore.ieee.org/document/10600718/

work page arXiv 2024
[19]

Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,

S. A. Naeini, L. Simmatis, D. Jafari, Y . Yunusova, and B. Taati, “Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,”IEEE Journal of Translational Engineering in Health and Medicine, vol. 12, pp. 382–389, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10464345/

work page arXiv 2024
[20]

Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,

K. El Hajal, E. Hermann, S. Hovsepyan, and M. M. Doss, “Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,” inProc. Interspeech 2025, 2025, pp. 2760–

work page 2025
[21]

Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

[Online]. Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

work page 2025
[22]

Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,

C.-J. Li, E. Yeo, K. Choi, P. A. P ´erez-Toro, M. Someki, R. K. Das, Z. Yue, J. R. Orozco-Arroyave, E. N ¨oth, and D. R. Mortensen, “Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,” in Proc. Interspeech 2025, 2025, pp. 2128–2132. [Online]. Available: https://www.isca-archive.org/interspee...

work page 2025
[23]

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,

M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,” inProc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022-05-23, pp. 7382–7386. [Online]. Available: https:// ieeexplore.ieee.org/document/9746585/

work page arXiv 2022
[24]

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,

E. Hermann and M. Magimai. Doss, “Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,” inProc. Interspeech 2023, 2023, pp. 156–160. [Online]. Available: https: //www.isca-archive.org/interspeech 2023/hermann23 interspeech.html

work page 2023
[25]

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,

W.-Z. Leung, M. Cross, A. Ragni, and S. Goetze, “Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,” inProc. Interspeech 2024, 2024, pp. 2494–2498. [Online]. Available: https://www.isca-archive. org/interspeech 2024/leung24 interspeech.html

work page 2024
[26]

Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,

B. M. Halpern, W.-C. Huang, L. P. Violeta, R. van Son, and T. Toda, “Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,” inProc. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–7. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/ 10389707

work page 2023
[27]

The design for the wall street journal-based csr corpus,

D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus,” inProc. HLT ’91 Workshop on Speech and Natural Language. USA: Association for Computational Linguistics, 1992, p. 357–362. [Online]. Available: https://doi.org/10.3115/1075527.1075614

work page doi:10.3115/1075527.1075614 1992
[28]

Het Corpus Gesproken Nederlands,

N. Oostdijk, “Het Corpus Gesproken Nederlands,” 1999. [Online]. Available: https://hdl.handle.net/2066/76350

work page 1999
[29]

Manipulation of oral cancer speech using neural articulatory synthesis,

B. M. Halpern, T. Rebernik, T. Tienkamp, R. van Son, M. van den Brekel, M. Wieling, M. Witjes, and O. Scharenborg, “Manipulation of oral cancer speech using neural articulatory synthesis,” 2022-03-31, pre-published. [Online]. Available: http://arxiv.org/abs/2203.17072

work page arXiv 2022
[30]

Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,

Y . He, K. P. Seng, C. S. Lim, and L. M. Ang, “Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,”Advanced Intelligent Systems, p. e202500873, 2025-10. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202500873

work page doi:10.1002/aisy.202500873 2025
[31]

Robust Speech Recognition via Large-Scale Weak Super- vision,

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Super- vision,” inProc. 2023 International Conference on Machine Learning (ICML), 2022-06-12

work page 2023
[32]

Scaling Speech Technology to 1,000+ Languages,

V . Pratap, A. Tjandra, B. Shi, P. Tomasello, A. Babu, S. Kundu, A. Elkahky, Z. Ni, A. Vyas, M. Fazel-Zarandiet al., “Scaling Speech Technology to 1,000+ Languages,”Journal of Machine Learning Research, vol. 25, no. 97, pp. 1–52, 2024. [Online]. Available: https://jmlr.org/papers/volume25/23-1318/23-1318.pdf

work page 2024
[33]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[34]

Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,

C. Bhat, A. Panda, and H. Strik, “Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,” inProc. Interspeech 2022. ISCA, 2022-09-18, pp. 46–50. [Online]. Available: https://www.isca-archive.org/interspeech 2022/bhat22 interspeech.html

work page 2022
[35]

V oice Conversion With Just Nearest Neighbors,

M. Baas, B. van Niekerk, and H. Kamper, “V oice Conversion With Just Nearest Neighbors,” inProc. Interspeech 2023, 2023-05-

work page 2023
[36]

Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

[Online]. Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

work page 2023
[37]

WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,

S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y . Qian, Y . Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022-06-17. [Onlin...

work page arXiv 2022
[38]

XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,

E. Casanova, K. Davis, E. G ¨olge, G. G ¨oknar, I. Gulea, L. Hart, A. Aljafari, J. Meyer, R. Morais, S. Olayemi, and J. Weber, “XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,” inProc. Interspeech 2024, 2024-06-07. [Online]. Available: https: //www.isca-archive.org/interspeech 2024/casanova24 interspeech.html

work page 2024
[39]

Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,

V . Vandeghinste, B. Bult ´e, and L. Augustinus, “Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,”Proceedings of CLARIN Annual Conference 2019, pp. 188–191, 2019-10-01

work page 2019
[40]

Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,

Z. Yue, F. Xiong, H. Christensen, and J. Barker, “Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,” inProc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020-05, pp. 6094–6098. [Online]. Available: https://ieeexplore.ieee. org/document/9054343/

work page arXiv 2020
[41]

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,

L. Prananta, B. Halpern, S. Feng, and O. Scharenborg, “The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,” inProc. Interspeech

work page
[42]

ISCA, 2022-09-18, pp. 36–40. [Online]. Available: https: //www.isca-archive.org/interspeech 2022/prananta22 interspeech.html

work page 2022
[43]

On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,

G. Schu, P. Janbakhshi, and I. Kodrasi, “On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,” inProc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10095981/

work page arXiv 2023
[44]

Accurate synthesis of dysarthric Speech for ASR data augmentation,

M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Accurate synthesis of dysarthric Speech for ASR data augmentation,”Speech Communication, vol. 164, p. 103112, 2024-10-

work page 2024
[45]

Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

work page
[46]

Nasaliteitsmeting met de nasometer,

J. van de Weijer and I. Slis, “Nasaliteitsmeting met de nasometer,” Logopedie & F oniatrie, vol. 63, pp. 97–101, 1991. [Online]. Available: https://hdl.handle.net/2066/323177

work page 1991
[47]

De ontwikkeling van een fonetisch gebalanceerde standaardtekst,

H. Martens, G. Nuffelen, and M. Bodt, “De ontwikkeling van een fonetisch gebalanceerde standaardtekst,”Logopedie, vol. 23, pp. 31–36, 01 2010

work page 2010
[48]

The IFA corpus: a phonemically segmented Dutch

R. van Son, D. Binnenpoorte, H. Heuvel, and L. Pols, “The IFA corpus: a phonemically segmented Dutch ”open source” speech database,” in Proc. Eurospeech 2001, 2001

work page 2001
[49]

librosa: Audio and music signal analysis in Python

B. McFee, C. Raffel, D. Liang, D. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: Audio and Music Signal Analysis in Python,” inProc. Python in Science Conference, 2015, pp. 18–24. [Online]. Available: https://doi.org/10.25080/Majora-7b98e3ed-003

work page doi:10.25080/majora-7b98e3ed-003 2015
[50]

TorchAudio: Building Blocks for Audio and Speech Processing,

Y .-Y . Yang, M. Hira, Z. Ni, A. Chourdia, A. Astafurov, C. Chen, C.-F. Yeh, C. Puhrsch, D. Pollack, D. Genzel, D. Greenberg, E. Z. Yang, J. Lian, J. Mahadeokar, J. Hwang, J. Chen, P. Goldsborough, P. Roy, S. Narenthiran, S. Watanabe, S. Chintala, V . Quenneville- B´elair, and Y . Shi, “TorchAudio: Building Blocks for Audio and Speech Processing,” inProc....

work page 2022
[51]

NLP augmentation,

E. Ma, “NLP augmentation,” 2019. [Online]. Available: https: //github.com/makcedward/nlpaug

work page 2019
[52]

Common voice: A massively-multilingual speech corpus,

R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” inProceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, 2020-05, pp. 4218–4222. [Online]. Available: https://aclanthology.org/...

work page 2020
[53]

Wav2vec 2.0: A framework for self-supervised learning of speech representations,

A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 12 449– 12 460, 2020

work page 2020
[54]

PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,

S. Mangrulkar, S. Gugger, L. Debut, Y . Belkada, S. Paul, and B. Bossan, “PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,”

work page
[55]

Available: https://github.com/huggingface/peft

[Online]. Available: https://github.com/huggingface/peft

work page
[56]

Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,

B. M. Halpern, T. Tienkamp, D. Abur, and T. Toda, “Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,” 2025, unpublished

work page 2025
[57]

A Survey on LoRA of Large Language Models,

Y . Mao, Y . Ge, Y . Fan, W. Xu, Y . Mi, Z. Hu, and Y . Gao, “A Survey on LoRA of Large Language Models,”Frontiers of Computer Science, vol. 19, no. 7, p. 197605, 2024-12-14. [Online]. Available: https://doi.org/10.1007/s11704-024-40663-9

work page doi:10.1007/s11704-024-40663-9 2024
[58]

LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,

Z. Song, J. Zhuo, Y . Yang, Z. Ma, S. Zhang, and X. Chen, “LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,” in Proc. Interspeech 2024, 2024. [Online]. Available: https://www. isca-archive.org/interspeech 2024/song24 interspeech.html

work page 2024
[59]

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,

P. Gabler, B. C. Geiger, B. Schuppler, and R. Kern, “Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,”Information, vol. 14, no. 2, p. 137, 2023-02. [Online]. Available: https: //www.mdpi.com/2078-2489/14/2/137

work page 2023
[60]

Impact of Speech Mode in Automatic Pathological Speech Detection,

S. A. Sheikh and I. Kodrasi, “Impact of Speech Mode in Automatic Pathological Speech Detection,” inProc. 2024 European Signal Processing Conference (EUSIPCO), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10714947/

work page arXiv 2024
[61]

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,

J. Tobin, P. Nelson, B. MacDonald, R. Heywood, R. Cave, K. Seaver, A. Desjardins, P.-P. Jiang, and J. R. Green, “Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,”Journal of Speech, Language, and Hearing Research, vol. 67, no. 11, pp. 4176–4185, 2024-11-07. [Online]. Available: https://pubs.asha.org/doi/10.1044/2...

work page doi:10.1044/2024 2024
[62]

Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,

Y .-L. Liu, R. Feng, J.-H. Yuan, and Z.-H. Ling, “Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,” inProc. Interspeech 2024, 2024, pp. 2435–2439. [Online]. Available: https://www.isca-archive.org/interspeech 2024/liu24f interspeech.html

work page 2024

[1] [1]

The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,

K. D. Shield, J. Ferlay, A. Jemal, R. Sankaranarayanan, A. K. Chaturvedi, F. Bray, and I. Soerjomataram, “The global incidence of lip, oral cavity, and pharyngeal cancers by subsite in 2012,”CA: a cancer journal for clinicians, vol. 67, no. 1, pp. 51–64, 2017

work page 2012

[2] [2]

Cancer statistics for the year 2020: An overview,

J. Ferlay, M. Colombet, I. Soerjomataram, D. M. Parkin, M. Pi ˜neros, A. Znaor, and F. Bray, “Cancer statistics for the year 2020: An overview,”International Journal of Cancer, vol. 149, no. 4, pp. 778–789, 2021. [Online]. Available: https://onlinelibrary.wiley.com/doi/ abs/10.1002/ijc.33588

work page doi:10.1002/ijc.33588 2020

[3] [3]

Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,

G. Constantinescu and J. M. Rieger, “Speech Deficits Associated with Oral and Oropharyngeal Carcinomas,” inClinical Care and Rehabilitation in Head and Neck Cancer, P. C. Doyle, Ed. Springer International Publishing, 2019, pp. 265–279. [Online]. Available: https://doi.org/10.1007/978-3-030-04702-3 16

work page doi:10.1007/978-3-030-04702-3 2019

[4] [4]

Speech Disorders Related to Head and Neck Cancer,

T. Bressmann, “Speech Disorders Related to Head and Neck Cancer,” inThe Handbook of Language and Speech Disorders. John Wiley & Sons, Ltd, 2021, pp. 495–527. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119606987.ch22

work page doi:10.1002/9781119606987.ch22 2021

[5] [5]

Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,

T. B. Tienkamp, T. Rebernik, R. A. D’Cruz, R. van Son, M. Wieling, M. J. H. Witjes, S. de Visscher, and D. Abur, “Articulatory–kinematic changes in speech following surgical treatment for oral or oropharyngeal cancer: A systematic review,”International Journal of Language & Communication Disorders, vol. 60, no. 1, p. e13148, 2025. [Online]. Available: htt...

work page doi:10.1111/1460-6984.13148 2025

[6] [6]

Transformers in speech processing: Overcoming challenges and paving the future,

S. Latif, S. A. M. Zaidi, H. Cuay ´ahuitl, F. Shamshad, M. Shoukat, M. Usama, and J. Qadir, “Transformers in speech processing: Overcoming challenges and paving the future,”Computer Science Review, vol. 58, p. 100768, 2025. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1574013725000449

work page 2025

[7] [7]

Low-resource automatic speech recognition and error analyses of oral cancer speech,

B. M. Halpern, S. Feng, R. Van Son, M. Van Den Brekel, and O. Scharenborg, “Low-resource automatic speech recognition and error analyses of oral cancer speech,”Speech Communication, vol. 141, pp. 14–27, 2022-06. [Online]. Available: https://linkinghub.elsevier.com/ retrieve/pii/S0167639322000620

work page 2022

[8] [8]

Automatic speech recognition and error analyses of Dutch oral cancer speech,

K. Wildenburg, “Automatic speech recognition and error analyses of Dutch oral cancer speech,” Master’s thesis, University of Groningen,

work page

[9] [9]

Available: https://campus-fryslan.studenttheses.ub.rug

[Online]. Available: https://campus-fryslan.studenttheses.ub.rug. nl/224/

work page

[10] [10]

Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,

S. A. Sheikh, M. Sahidullah, and I. Kodrasi, “Overview of automatic speech analysis and technologies for neurodegenerative disorders: Di- agnosis and assistive applications,”IEEE Journal of Selected Topics in Signal Processing, vol. 19, no. 5, pp. 700–716, 2025

work page 2025

[11] [11]

A survey of technologies for automatic Dysarthric speech recognition,

Z. Qian, K. Xiao, and C. Yu, “A survey of technologies for automatic Dysarthric speech recognition,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, no. 1, p. 48, 2023-11-

work page 2023

[12] [12]

Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

[Online]. Available: https://asmp-eurasipjournals.springeropen.com/ articles/10.1186/s13636-023-00318-2

work page doi:10.1186/s13636-023-00318-2

[13] [13]

Audio augmentation for speech recognition,

T. Ko, V . Peddinti, D. Povey, and S. Khudanpur, “Audio augmentation for speech recognition,” inProc. Interspeech 2015. ISCA, 2015-09- 06, pp. 3586–3589. [Online]. Available: https://www.isca-archive.org/ interspeech 2015/ko15 interspeech.html

work page 2015

[14] [14]

V ocal Tract Length Perturbation (VTLP) improves speech recognition,

N. Jaitly and G. E. Hinton, “V ocal Tract Length Perturbation (VTLP) improves speech recognition,”ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013

work page 2013

[15] [15]

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,

B. Vachhani, C. Bhat, and S. K. Kopparapu, “Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition,” in Proc. Interspeech 2018. ISCA, 2018-09-02, pp. 471–475. [Online]. Available: https://www.isca-archive.org/interspeech 2018/vachhani18 interspeech.html

work page 2018

[16] [16]

Investigation of Data Augmentation Techniques for Disordered Speech Recognition,

M. Geng, X. Xie, S. Liu, J. Yu, S. Hu, X. Liu, and H. Meng, “Investigation of Data Augmentation Techniques for Disordered Speech Recognition,” inProc. Interspeech 2020, 2020, pp. 696–

work page 2020

[17] [17]

Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

[Online]. Available: https://www.isca-archive.org/interspeech 2020/geng20 interspeech.pdf

work page 2020

[18] [18]

Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,

R. Gracelli and J. Almeida, “Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition,” inProc. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 2024-06, pp. 243–248. [Online]. Available: https://ieeexplore.ieee.org/document/10600718/

work page arXiv 2024

[19] [19]

Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,

S. A. Naeini, L. Simmatis, D. Jafari, Y . Yunusova, and B. Taati, “Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation,”IEEE Journal of Translational Engineering in Health and Medicine, vol. 12, pp. 382–389, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10464345/

work page arXiv 2024

[20] [20]

Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,

K. El Hajal, E. Hermann, S. Hovsepyan, and M. M. Doss, “Unsupervised Rhythm and V oice Conversion to Improve ASR on Dysarthric Speech,” inProc. Interspeech 2025, 2025, pp. 2760–

work page 2025

[21] [21]

Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

[Online]. Available: https://www.isca-archive.org/interspeech 2025/elhajal25 interspeech.html#

work page 2025

[22] [22]

Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,

C.-J. Li, E. Yeo, K. Choi, P. A. P ´erez-Toro, M. Someki, R. K. Das, Z. Yue, J. R. Orozco-Arroyave, E. N ¨oth, and D. R. Mortensen, “Towards Inclusive ASR: Investigating V oice Conversion for Dysarthric Speech Recognition in Low-Resource Languages,” in Proc. Interspeech 2025, 2025, pp. 2128–2132. [Online]. Available: https://www.isca-archive.org/interspee...

work page 2025

[23] [23]

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,

M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition,” inProc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022-05-23, pp. 7382–7386. [Online]. Available: https:// ieeexplore.ieee.org/document/9746585/

work page arXiv 2022

[24] [24]

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,

E. Hermann and M. Magimai. Doss, “Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation,” inProc. Interspeech 2023, 2023, pp. 156–160. [Online]. Available: https: //www.isca-archive.org/interspeech 2023/hermann23 interspeech.html

work page 2023

[25] [25]

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,

W.-Z. Leung, M. Cross, A. Ragni, and S. Goetze, “Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis,” inProc. Interspeech 2024, 2024, pp. 2494–2498. [Online]. Available: https://www.isca-archive. org/interspeech 2024/leung24 interspeech.html

work page 2024

[26] [26]

Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,

B. M. Halpern, W.-C. Huang, L. P. Violeta, R. van Son, and T. Toda, “Improving Severity Preservation of Healthy-to-Pathological V oice Conversion With Global Style Tokens,” inProc. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–7. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/ 10389707

work page 2023

[27] [27]

The design for the wall street journal-based csr corpus,

D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus,” inProc. HLT ’91 Workshop on Speech and Natural Language. USA: Association for Computational Linguistics, 1992, p. 357–362. [Online]. Available: https://doi.org/10.3115/1075527.1075614

work page doi:10.3115/1075527.1075614 1992

[28] [28]

Het Corpus Gesproken Nederlands,

N. Oostdijk, “Het Corpus Gesproken Nederlands,” 1999. [Online]. Available: https://hdl.handle.net/2066/76350

work page 1999

[29] [29]

Manipulation of oral cancer speech using neural articulatory synthesis,

B. M. Halpern, T. Rebernik, T. Tienkamp, R. van Son, M. van den Brekel, M. Wieling, M. Witjes, and O. Scharenborg, “Manipulation of oral cancer speech using neural articulatory synthesis,” 2022-03-31, pre-published. [Online]. Available: http://arxiv.org/abs/2203.17072

work page arXiv 2022

[30] [30]

Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,

Y . He, K. P. Seng, C. S. Lim, and L. M. Ang, “Robust Dysarthric Speech Recognition with GAN Enhancement and LLM Correction,”Advanced Intelligent Systems, p. e202500873, 2025-10. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202500873

work page doi:10.1002/aisy.202500873 2025

[31] [31]

Robust Speech Recognition via Large-Scale Weak Super- vision,

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Super- vision,” inProc. 2023 International Conference on Machine Learning (ICML), 2022-06-12

work page 2023

[32] [32]

Scaling Speech Technology to 1,000+ Languages,

V . Pratap, A. Tjandra, B. Shi, P. Tomasello, A. Babu, S. Kundu, A. Elkahky, Z. Ni, A. Vyas, M. Fazel-Zarandiet al., “Scaling Speech Technology to 1,000+ Languages,”Journal of Machine Learning Research, vol. 25, no. 97, pp. 1–52, 2024. [Online]. Available: https://jmlr.org/papers/volume25/23-1318/23-1318.pdf

work page 2024

[33] [33]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022

[34] [34]

Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,

C. Bhat, A. Panda, and H. Strik, “Improved ASR Performance for Dysarthric Speech Using Two-stage Data Augmentation,” inProc. Interspeech 2022. ISCA, 2022-09-18, pp. 46–50. [Online]. Available: https://www.isca-archive.org/interspeech 2022/bhat22 interspeech.html

work page 2022

[35] [35]

V oice Conversion With Just Nearest Neighbors,

M. Baas, B. van Niekerk, and H. Kamper, “V oice Conversion With Just Nearest Neighbors,” inProc. Interspeech 2023, 2023-05-

work page 2023

[36] [36]

Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

[Online]. Available: https://www.isca-archive.org/interspeech 2023/ baas23 interspeech.html

work page 2023

[37] [37]

WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,

S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y . Qian, Y . Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large-Scale Self-Supervised Pre- Training for Full Stack Speech Processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022-06-17. [Onlin...

work page arXiv 2022

[38] [38]

XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,

E. Casanova, K. Davis, E. G ¨olge, G. G ¨oknar, I. Gulea, L. Hart, A. Aljafari, J. Meyer, R. Morais, S. Olayemi, and J. Weber, “XTTS: A Massively Multilingual Zero-Shot Text-to-Speech Model,” inProc. Interspeech 2024, 2024-06-07. [Online]. Available: https: //www.isca-archive.org/interspeech 2024/casanova24 interspeech.html

work page 2024

[39] [39]

Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,

V . Vandeghinste, B. Bult ´e, and L. Augustinus, “Wablieft: An Easy-to- Read Newspaper Corpus for Dutch,”Proceedings of CLARIN Annual Conference 2019, pp. 188–191, 2019-10-01

work page 2019

[40] [40]

Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,

Z. Yue, F. Xiong, H. Christensen, and J. Barker, “Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition,” inProc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020-05, pp. 6094–6098. [Online]. Available: https://ieeexplore.ieee. org/document/9054343/

work page arXiv 2020

[41] [41]

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,

L. Prananta, B. Halpern, S. Feng, and O. Scharenborg, “The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition,” inProc. Interspeech

work page

[42] [42]

ISCA, 2022-09-18, pp. 36–40. [Online]. Available: https: //www.isca-archive.org/interspeech 2022/prananta22 interspeech.html

work page 2022

[43] [43]

On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,

G. Schu, P. Janbakhshi, and I. Kodrasi, “On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,” inProc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10095981/

work page arXiv 2023

[44] [44]

Accurate synthesis of dysarthric Speech for ASR data augmentation,

M. Soleymanpour, M. T. Johnson, R. Soleymanpour, and J. Berry, “Accurate synthesis of dysarthric Speech for ASR data augmentation,”Speech Communication, vol. 164, p. 103112, 2024-10-

work page 2024

[45] [45]

Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0167639324000839

work page

[46] [46]

Nasaliteitsmeting met de nasometer,

J. van de Weijer and I. Slis, “Nasaliteitsmeting met de nasometer,” Logopedie & F oniatrie, vol. 63, pp. 97–101, 1991. [Online]. Available: https://hdl.handle.net/2066/323177

work page 1991

[47] [47]

De ontwikkeling van een fonetisch gebalanceerde standaardtekst,

H. Martens, G. Nuffelen, and M. Bodt, “De ontwikkeling van een fonetisch gebalanceerde standaardtekst,”Logopedie, vol. 23, pp. 31–36, 01 2010

work page 2010

[48] [48]

The IFA corpus: a phonemically segmented Dutch

R. van Son, D. Binnenpoorte, H. Heuvel, and L. Pols, “The IFA corpus: a phonemically segmented Dutch ”open source” speech database,” in Proc. Eurospeech 2001, 2001

work page 2001

[49] [49]

librosa: Audio and music signal analysis in Python

B. McFee, C. Raffel, D. Liang, D. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “Librosa: Audio and Music Signal Analysis in Python,” inProc. Python in Science Conference, 2015, pp. 18–24. [Online]. Available: https://doi.org/10.25080/Majora-7b98e3ed-003

work page doi:10.25080/majora-7b98e3ed-003 2015

[50] [50]

TorchAudio: Building Blocks for Audio and Speech Processing,

Y .-Y . Yang, M. Hira, Z. Ni, A. Chourdia, A. Astafurov, C. Chen, C.-F. Yeh, C. Puhrsch, D. Pollack, D. Genzel, D. Greenberg, E. Z. Yang, J. Lian, J. Mahadeokar, J. Hwang, J. Chen, P. Goldsborough, P. Roy, S. Narenthiran, S. Watanabe, S. Chintala, V . Quenneville- B´elair, and Y . Shi, “TorchAudio: Building Blocks for Audio and Speech Processing,” inProc....

work page 2022

[51] [51]

NLP augmentation,

E. Ma, “NLP augmentation,” 2019. [Online]. Available: https: //github.com/makcedward/nlpaug

work page 2019

[52] [52]

Common voice: A massively-multilingual speech corpus,

R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” inProceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, 2020-05, pp. 4218–4222. [Online]. Available: https://aclanthology.org/...

work page 2020

[53] [53]

Wav2vec 2.0: A framework for self-supervised learning of speech representations,

A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,”Ad- vances in Neural Information Processing Systems, vol. 33, pp. 12 449– 12 460, 2020

work page 2020

[54] [54]

PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,

S. Mangrulkar, S. Gugger, L. Debut, Y . Belkada, S. Paul, and B. Bossan, “PEFT: State-of-the-art Parameter-Efficient Fine-Tuning Methods,”

work page

[55] [55]

Available: https://github.com/huggingface/peft

[Online]. Available: https://github.com/huggingface/peft

work page

[56] [56]

Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,

B. M. Halpern, T. Tienkamp, D. Abur, and T. Toda, “Towards Explain- able Reference-Free Speech Intelligibility Evaluation of People with Pathological Speech,” 2025, unpublished

work page 2025

[57] [57]

A Survey on LoRA of Large Language Models,

Y . Mao, Y . Ge, Y . Fan, W. Xu, Y . Mi, Z. Hu, and Y . Gao, “A Survey on LoRA of Large Language Models,”Frontiers of Computer Science, vol. 19, no. 7, p. 197605, 2024-12-14. [Online]. Available: https://doi.org/10.1007/s11704-024-40663-9

work page doi:10.1007/s11704-024-40663-9 2024

[58] [58]

LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,

Z. Song, J. Zhuo, Y . Yang, Z. Ma, S. Zhang, and X. Chen, “LoRA- Whisper: Parameter-Efficient and Extensible Multilingual ASR,” in Proc. Interspeech 2024, 2024. [Online]. Available: https://www. isca-archive.org/interspeech 2024/song24 interspeech.html

work page 2024

[59] [59]

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,

P. Gabler, B. C. Geiger, B. Schuppler, and R. Kern, “Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition,”Information, vol. 14, no. 2, p. 137, 2023-02. [Online]. Available: https: //www.mdpi.com/2078-2489/14/2/137

work page 2023

[60] [60]

Impact of Speech Mode in Automatic Pathological Speech Detection,

S. A. Sheikh and I. Kodrasi, “Impact of Speech Mode in Automatic Pathological Speech Detection,” inProc. 2024 European Signal Processing Conference (EUSIPCO), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10714947/

work page arXiv 2024

[61] [61]

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,

J. Tobin, P. Nelson, B. MacDonald, R. Heywood, R. Cave, K. Seaver, A. Desjardins, P.-P. Jiang, and J. R. Green, “Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech,”Journal of Speech, Language, and Hearing Research, vol. 67, no. 11, pp. 4176–4185, 2024-11-07. [Online]. Available: https://pubs.asha.org/doi/10.1044/2...

work page doi:10.1044/2024 2024

[62] [62]

Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,

Y .-L. Liu, R. Feng, J.-H. Yuan, and Z.-H. Ling, “Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech,” inProc. Interspeech 2024, 2024, pp. 2435–2439. [Online]. Available: https://www.isca-archive.org/interspeech 2024/liu24f interspeech.html

work page 2024