The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to those sources.
Quantifying bias in automatic speech recognition
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
ASR bias causes users from underrepresented dialects to internalize failures as personal inadequacy and perform extensive emotional and linguistic labor, revealing harms missed by accuracy-only evaluations.
Few-shot TTS adaptation combined with LLM-guided phoneme editing produces synthetic accented speech that improves ASR word error rates on real accented audio even in cross-speaker and ultra-low-data settings.
Omnimodal models show reduced demographic bias in image and video tasks compared to substantial biases and lower performance in audio tasks.
citing papers explorer
-
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to those sources.
-
"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias
ASR bias causes users from underrepresented dialects to internalize failures as personal inadequacy and perform extensive emotional and linguistic labor, revealing harms missed by accuracy-only evaluations.
-
Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing
Few-shot TTS adaptation combined with LLM-guided phoneme editing produces synthetic accented speech that improves ASR word error rates on real accented audio even in cross-speaker and ultra-low-data settings.
-
Demographic and Linguistic Bias Evaluation in Omnimodal Language Models
Omnimodal models show reduced demographic bias in image and video tasks compared to substantial biases and lower performance in audio tasks.