Recognition: unknown
"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias
Pith reviewed 2026-05-09 23:36 UTC · model grok-4.3
The pith
ASR bias evaluations based on accuracy alone overlook the emotional labor, self-monitoring, and internalized inadequacy users experience with non-standard dialects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Qualitative analysis of open-ended narratives from participants across Atlanta, Gulf Coast, Miami Beach, and Tucson shows that ASR systems encode particular varieties as standard while marginalizing others. Users perform extensive invisible labor including code-switching, hyper-articulation, and emotional management to compensate, yet still internalize failures as personal shortcomings despite recognizing that the systems were not designed for them. Algorithmic fairness assessments based on accuracy metrics alone therefore miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological
What carries the argument
Qualitative analysis of user narratives from dialect-specific UX studies that uncovers invisible adaptation labor and internalized harm beyond error counts.
If this is right
- Fairness evaluations of ASR systems must track emotional labor and psychological impact in addition to word-error rates.
- Speech technology design should treat diverse language varieties as legitimate targets rather than sources of user adaptation.
- Users' expressed willingness to contribute data and feedback creates opportunities for participatory improvement that values their linguistic knowledge.
- Awareness that systems are biased does not prevent users from internalizing failures as personal inadequacy.
Where Pith is reading between the lines
- Voice interfaces in other domains such as virtual assistants may impose comparable unseen adaptation costs on the same speaker groups.
- Incorporating measures of user effort and cultural fit during model development could reduce the need for compensatory behaviors like code-switching.
- Qualitative methods that surface non-quantifiable harms could be extended to fairness audits of additional language technologies.
Load-bearing premise
That participants' open-ended interview responses accurately reflect the full emotional and cognitive costs without distortion from the interview context, self-reporting biases, or researcher interpretation.
What would settle it
A controlled comparison finding no difference in reported emotional labor, self-monitoring effort, or feelings of inadequacy between standard-dialect and non-standard-dialect ASR users when error rates are held constant would falsify the claim that accuracy metrics miss these harms.
read the original abstract
Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents findings from qualitative user experience studies conducted across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) with speakers of distinct English dialect communities. It claims that ASR bias evaluations relying solely on accuracy metrics overlook critical dimensions of harm, including the emotional labor of repeated technological rejection, the cognitive burden of constant self-monitoring and code-switching, and the psychological toll of internalized feelings of inadequacy in one's native language variety, despite participants' awareness that systems were not designed for them and their willingness to contribute to improvements.
Significance. If the results hold, the work meaningfully advances ASR fairness research by providing empirical evidence from participant narratives that accuracy-only assessments are incomplete, highlighting unmeasured experiential costs. The manuscript earns credit for supplying participant demographics, interview protocols, thematic coding processes, and direct quotes that ground the interpretive claims without internal contradictions or unsupported leaps.
minor comments (2)
- Abstract: The summary of findings would be strengthened by briefly noting sample sizes and the high-level structure of the thematic analysis (e.g., number of participants and coding steps), which are detailed in the full text but absent from the abstract.
- Section on methodology: Clarify whether inter-coder reliability metrics or member-checking procedures were used in the thematic analysis to further address potential interpretive bias, even if the current description is already transparent.
Simulated Author's Rebuttal
We thank the referee for the positive summary of our work, the recognition of its significance in advancing ASR fairness research beyond accuracy metrics, and the recommendation for minor revision. The assessment that our qualitative findings are grounded in participant narratives without unsupported claims is appreciated.
Circularity Check
No significant circularity in qualitative empirical study
full rationale
This is a qualitative user-experience study drawing on interview narratives from dialect speakers across four U.S. sites. It contains no equations, fitted parameters, predictions, or derivation chains. All load-bearing claims rest on direct participant quotes, thematic coding descriptions, and reported patterns, which are independent of any self-citation or internal redefinition. The work is therefore self-contained as straightforward empirical reporting.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy
April Baker-Bell. 2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy. Routledge, New York. doi:10.4324/ 9781315147383
2020
-
[2]
Allan Bell. 1984. Language Style as Audience Design.Language in Society13 (1984), 145–204
1984
-
[3]
2019.Race After Technology: Abolitionist Tools for the New Jim Code
Ruha Benjamin. 2019.Race After Technology: Abolitionist Tools for the New Jim Code. Wiley. Google-Books-ID: nPy9uwEACAAJ
2019
-
[4]
Barnini Bhattacharyya and Jennifer L. Berdahl. 2023. Do you see me? An inductive examination of differences between women of color’s experiences of and responses to invisibility at work.Journal of Applied Psychology108, 7 (2023), 1073–1095. doi:10.1037/apl0001072 Place: US Publisher: American Psychological Association
-
[5]
Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. Demographic Dialectal Variation in Social Media: A Case Study of African- American English. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh, and Xavier Carreras (Eds.). Association for Computational Linguistics, Austin, Texas, 1119–1130. d...
-
[6]
June Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. 2022. Language-specific Effects on Automatic Speech Recognition Errors for World Englishes. InProceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 7177–7186. https://acl...
2022
-
[7]
Wellman, Susan Carey, Lila Gleitman, Elissa L
Sasha Costanza-Chock. 2020.Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press. doi:10.7551/mitpress/ 12255.001.0001
-
[8]
2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics
Nikolas Coupland. 2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics. University Press, Chapter 11
2001
-
[9]
Jay Cunningham, Su Lin Blodgett, Michael Madaio, Hal Daumé Iii, Christina Harrington, and Hanna Wallach. 2024. Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Associ...
-
[10]
Rachel Dorn. 2019. Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English. InProceedings of the Student Research Workshop Associated with RANLP 2019, Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, and Ivelina Nikolova (Eds.). INCOMA Ltd., Varna, Bulgaria, 16–20. doi:10.26615/issn.2603-2821.2019_003
-
[11]
2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High
Penelope Eckert. 2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High. Wiley-Blackwell
2000
-
[12]
Siyuan Feng, Bence Mark Halpern, Olya Kudina, and Odette Scharenborg. 2024. Towards inclusive automatic speech recognition. Computer Speech & Language84 (March 2024), 101567. doi:10.1016/j.csl.2023.101567
-
[13]
Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying Bias in Automatic Speech Recognition. doi:10.48550/arXiv.2103.15122 arXiv:2103.15122 [cs, eess]
-
[14]
2003.Chicano English in Context
Carmen Fought. 2003.Chicano English in Context. Palgrave Macmillan. Google-Books-ID: AzskWeEaWrEC
2003
-
[15]
Powesland
Howard Giles and Peter F. Powesland. 1975.Speech Style and Social Evaluation. Academic Press
1975
-
[17]
Lisa J. Green. 2002.African American English: A Linguistic Introduction. Cambridge University Press, Cambridge. doi:10.1017/ CBO9780511800306
2002
-
[18]
Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing Community-Based Collaborative Design: Towards More Equitable Participatory Design Engagements.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 216:1–216:25. doi:10.1145/3359318 Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias FAccT ’26, June 25...
-
[19]
Camille Harris, Chijioke Mgbahurike, Neha Kumar, and Diyi Yang. 2024. Modeling Gender and Dialect Bias in Automatic Speech Recognition. InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 15166–15184. doi:10.18653/...
-
[20]
2012.The Managed Heart: Commercialization of Human Feeling(1 ed.)
Arlie Russell Hochschild. 2012.The Managed Heart: Commercialization of Human Feeling(1 ed.). University of California Press. https://www.jstor.org/stable/10.1525/j.ctt1pn9bk
-
[21]
Ben Hutchinson, Celeste Rodríguez Louro, Glenys Collard, and Ned Cooper. 2025. Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 108–124. doi:10.1...
-
[22]
Kirk, Mathieu Declerck, Ryan J
Neil W. Kirk, Mathieu Declerck, Ryan J. Kemp, and Vera Kempe. 2021. Language control in regional dialect speakers—monolingual by name, bilingual by nature?Bilingualism: Language and Cognition25, 3 (2021), 511–520. doi:10.1017/S1366728921000973
-
[23]
2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana
Thomas Klingler. 2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana. LSU Press. Google-Books-ID: Q4B2kWdViU4C
2003
-
[24]
Rickford and Dan Jurafsky and Sharad Goel , title =
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition.Proceedings of the National Academy of Sciences117, 14 (April 2020), 7684–7689. doi:10.1073/pnas.1915768117 Publisher: Proceedings of the National Acade...
-
[25]
1993.American Indian English
William Leap. 1993.American Indian English. University of Utah Press. Google-Books-ID: pL55AAAAIAAJ
1993
-
[26]
2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.)
Rosina Lippi-Green. 2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.). Routledge, London. doi:10.4324/9780203348802
-
[27]
Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, and Yatharth Saraf. 2021. Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions. doi:10.48550/ arXiv.2111.09983 arXiv:2111.09983 [cs, eess]
-
[28]
Joshua L. Martin and Kevin Tang. 2020. Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual “be”. InInterspeech 2020. ISCA, 626–630. doi:10.21437/Interspeech.2020-2893
-
[29]
I don’t Think These Devices are Very Culturally Sensitive
Zion Mengesha, Courtney Heldreth, Michal Lahav, Juliana Sublewski, and Elyse Tuennerman. 2021. “I don’t Think These Devices are Very Culturally Sensitive. ”—Impact of Automated Speech Recognition Errors on African Americans.Frontiers in Artificial Intelligence4 (Nov. 2021). doi:10.3389/frai.2021.725911 Publisher: Frontiers
-
[30]
It’s not a representation of me
Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, and Avijit Ghosh. 2025. “It’s not a representation of me”: Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, Athens Greece, 228–245. doi:10.1145/3715275.3732018
-
[31]
Ngueajio and Gloria Washington
Mikel K. Ngueajio and Gloria Washington. 2022. Hey ASR System! Why Aren’t You More Inclusive? Automatic Speech Recognition Systems’ Bias and Proposed Bias Mitigation Techniques. A Literature Review. Vol. 13518. 421–440. doi:10.1007/978-3-031-21707-4_30 arXiv:2211.09511 [cs, eess]
-
[32]
2018.Algorithms of Oppression: How Search Engines Reinforce Racism
Safiya Umoja Noble. 2018.Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press
2018
-
[33]
Bettina Pospisil, Thilo Sauter, Albert Treytl, Edith Huber, and Walter Seböck. 2022. "Totally Unnecessary" or "Simply Convenient" – About Users and Non-Users of Voice Assistants. In2022 15th International Conference on Human System Interaction (HSI). 1–7. doi:10.1109/HSI55341.2022.9869441 ISSN: 2158-2254
- [34]
-
[35]
Rickford and Sharese King
John R. Rickford and Sharese King. 2016. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond.Language92, 4 (2016), 948–988. https://muse.jhu.edu/pub/24/article/641206 Publisher: Linguistic Society of America
2016
-
[36]
Rachael Tatman. 2017. Gender and Dialect Bias in YouTube’s Automatic Captions. InProceedings of the First ACL Workshop on Ethics in Natural Language Processing, Dirk Hovy, Shannon Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube, and Hanna Wallach (Eds.). Association for Computational Linguistics, Valencia, Spain, 53–59. doi:10.18653/v1/W17-1606
-
[37]
Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand
Constanze C. Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand. 2019. Language control and Lexical Access in Diglossic Speech Production: Evidence from Variety Switching in Speakers of Swiss-German.Journal of Memory and Language107, 2019 (2019), 40–53. doi:10.1016/j.jml.2019.03.007
-
[38]
Alicia Beckford Wassink, Cady Gansen, and Isabel Bartholomew. 2022. Uneven success: automatic speech recognition and ethnicity- related dialects.Speech Communication140 (May 2022), 50–70. doi:10.1016/j.specom.2022.03.009 FAccT ’26, June 25–28, 2026, Montreal, Canada Siyu Liang and Alicia Beckford Wassink A Questionnaire Design A.1 Complete Questionnaire T...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.