Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

Allison Koenecke; Anna Seo Gyeong Choi; Hilke Schellmann; Katelyn Xiaoying Mei; Mona Sloane

read the original abstract

Automatic Speech Recognition (ASR) systems' growing use warrants robust auditing approaches to ensure equitable transcription quality, especially for people with speech disorders like aphasia who disproportionately depend on ASR. While academic and industry audits have revealed performance disparities across user populations, standard auditing practices often overlook nuances that risk masking harm to marginalized groups. We identify three common pitfalls in standard ASR audits: (1) adhering to one method of text standardization, which can mask variance in ASR performance and ignore the standardization preferences of marginalized communities; (2) displaying high-level demographic findings without considering performance disparities by nuanced intersectional subgroups, or conditioning on relevant acoustic properties; and (3) reporting only one gold-standard metric (Word Error Rate), which inadequately quantifies common generative AI errors like hallucinations. We propose a holistic auditing framework addressing these pitfalls, and in a case study of six popular ASR systems, find consistently worse ASR performance for speakers with aphasia relative to a control group. We call on practitioners to implement these robust, community-driven ASR auditing practices better suited for the rapidly changing ASR landscape.

Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

discussion (0)