arxiv: 2604.21148 · v2 · submitted 2026-04-22 · 💻 cs.CL

Recognition: unknown

"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

Siyu Liang , Alicia Beckford Wassink

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:36 UTC · model grok-4.3

classification 💻 cs.CL

keywords automatic speech recognitionASR biasuser experienceemotional impactdialect variationalgorithmic fairnessqualitative analysislinguistic diversity

0 comments

The pith

ASR bias evaluations based on accuracy alone overlook the emotional labor, self-monitoring, and internalized inadequacy users experience with non-standard dialects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports user experience studies with speakers of distinct U.S. English dialects in four locations to examine how ASR system failures shape lived experiences. Participants describe technologies that ignore their cultural backgrounds, forcing repeated adjustments such as code-switching and hyper-articulation just to achieve basic function. These interactions produce frustration and feelings of inadequacy that persist even when users know the systems were not built for their varieties. Standard fairness checks that count only error rates therefore miss the cognitive burden of constant self-monitoring and the psychological costs of repeated technological rejection.

Core claim

Qualitative analysis of open-ended narratives from participants across Atlanta, Gulf Coast, Miami Beach, and Tucson shows that ASR systems encode particular varieties as standard while marginalizing others. Users perform extensive invisible labor including code-switching, hyper-articulation, and emotional management to compensate, yet still internalize failures as personal shortcomings despite recognizing that the systems were not designed for them. Algorithmic fairness assessments based on accuracy metrics alone therefore miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological

What carries the argument

Qualitative analysis of user narratives from dialect-specific UX studies that uncovers invisible adaptation labor and internalized harm beyond error counts.

If this is right

Fairness evaluations of ASR systems must track emotional labor and psychological impact in addition to word-error rates.
Speech technology design should treat diverse language varieties as legitimate targets rather than sources of user adaptation.
Users' expressed willingness to contribute data and feedback creates opportunities for participatory improvement that values their linguistic knowledge.
Awareness that systems are biased does not prevent users from internalizing failures as personal inadequacy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Voice interfaces in other domains such as virtual assistants may impose comparable unseen adaptation costs on the same speaker groups.
Incorporating measures of user effort and cultural fit during model development could reduce the need for compensatory behaviors like code-switching.
Qualitative methods that surface non-quantifiable harms could be extended to fairness audits of additional language technologies.

Load-bearing premise

That participants' open-ended interview responses accurately reflect the full emotional and cognitive costs without distortion from the interview context, self-reporting biases, or researcher interpretation.

What would settle it

A controlled comparison finding no difference in reported emotional labor, self-monitoring effort, or feelings of inadequacy between standard-dialect and non-standard-dialect ASR users when error rates are held constant would falsify the claim that accuracy metrics miss these harms.

read the original abstract

Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper documents the emotional labor and internalized inadequacy that ASR failures create for non-standard dialect speakers, using interviews from four U.S. sites to argue that accuracy metrics alone miss key harms.

read the letter

The main takeaway is that ASR bias research has stayed too narrow on error rates and needs to account for how repeated failures force users into code-switching, hyper-articulation, and emotional management while still making them feel the problem is personal. The authors interviewed participants in Atlanta, Gulf Coast, Miami Beach, and Tucson, then used thematic analysis on the open narratives to surface these patterns. What the work does well is keep the focus on participant language: people know the systems were not built for their varieties yet still report frustration turning into self-doubt and extra daily effort. The quotes and themes make the gap between technical metrics and lived experience concrete without overclaiming universality. The full text supplies demographics, interview protocol, and coding steps, so the analysis is traceable rather than opaque. Soft spots are the usual ones for this method. The sample is tied to four specific locations and self-selected participants, so the findings illustrate rather than prove how widespread the burden is across all non-standard speakers. Self-report always carries some risk of retrospective coloring, though the paper does not appear to ignore that. No load-bearing math or fitted models are involved, which keeps the claims proportionate to the data. This is useful for researchers in HCI, speech technology fairness, and AI ethics who already accept that error-rate disparities exist and now want to measure downstream effects. A reader working on evaluation frameworks or mitigation strategies will find concrete examples to build on. It deserves peer review because the evidence backs the narrower claim that these experiential dimensions are real and currently unmeasured, and the qualitative approach is executed without internal contradictions.

Referee Report

0 major / 2 minor

Summary. The paper presents findings from qualitative user experience studies conducted across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) with speakers of distinct English dialect communities. It claims that ASR bias evaluations relying solely on accuracy metrics overlook critical dimensions of harm, including the emotional labor of repeated technological rejection, the cognitive burden of constant self-monitoring and code-switching, and the psychological toll of internalized feelings of inadequacy in one's native language variety, despite participants' awareness that systems were not designed for them and their willingness to contribute to improvements.

Significance. If the results hold, the work meaningfully advances ASR fairness research by providing empirical evidence from participant narratives that accuracy-only assessments are incomplete, highlighting unmeasured experiential costs. The manuscript earns credit for supplying participant demographics, interview protocols, thematic coding processes, and direct quotes that ground the interpretive claims without internal contradictions or unsupported leaps.

minor comments (2)

Abstract: The summary of findings would be strengthened by briefly noting sample sizes and the high-level structure of the thematic analysis (e.g., number of participants and coding steps), which are detailed in the full text but absent from the abstract.
Section on methodology: Clarify whether inter-coder reliability metrics or member-checking procedures were used in the thematic analysis to further address potential interpretive bias, even if the current description is already transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of its significance in advancing ASR fairness research beyond accuracy metrics, and the recommendation for minor revision. The assessment that our qualitative findings are grounded in participant narratives without unsupported claims is appreciated.

Circularity Check

0 steps flagged

No significant circularity in qualitative empirical study

full rationale

This is a qualitative user-experience study drawing on interview narratives from dialect speakers across four U.S. sites. It contains no equations, fitted parameters, predictions, or derivation chains. All load-bearing claims rest on direct participant quotes, thematic coding descriptions, and reported patterns, which are independent of any self-citation or internal redefinition. The work is therefore self-contained as straightforward empirical reporting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative empirical study with no mathematical models, free parameters, formal axioms, or postulated entities; it relies on standard social science methods of data collection and thematic interpretation.

pith-pipeline@v0.9.0 · 5567 in / 1135 out tokens · 21328 ms · 2026-05-09T23:36:15.548051+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 24 canonical work pages

[1]

2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy

April Baker-Bell. 2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy. Routledge, New York. doi:10.4324/ 9781315147383

2020
[2]

Allan Bell. 1984. Language Style as Audience Design.Language in Society13 (1984), 145–204

1984
[3]

2019.Race After Technology: Abolitionist Tools for the New Jim Code

Ruha Benjamin. 2019.Race After Technology: Abolitionist Tools for the New Jim Code. Wiley. Google-Books-ID: nPy9uwEACAAJ

2019
[4]

Barnini Bhattacharyya and Jennifer L. Berdahl. 2023. Do you see me? An inductive examination of differences between women of color’s experiences of and responses to invisibility at work.Journal of Applied Psychology108, 7 (2023), 1073–1095. doi:10.1037/apl0001072 Place: US Publisher: American Psychological Association

work page doi:10.1037/apl0001072 2023
[5]

Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. Demographic Dialectal Variation in Social Media: A Case Study of African- American English. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh, and Xavier Carreras (Eds.). Association for Computational Linguistics, Austin, Texas, 1119–1130. d...

work page doi:10.18653/v1/d16-1120 2016
[6]

June Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. 2022. Language-specific Effects on Automatic Speech Recognition Errors for World Englishes. InProceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 7177–7186. https://acl...

2022
[7]

Wellman, Susan Carey, Lila Gleitman, Elissa L

Sasha Costanza-Chock. 2020.Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press. doi:10.7551/mitpress/ 12255.001.0001

work page doi:10.7551/mitpress/ 2020
[8]

2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics

Nikolas Coupland. 2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics. University Press, Chapter 11

2001
[9]

Jay Cunningham, Su Lin Blodgett, Michael Madaio, Hal Daumé Iii, Christina Harrington, and Hanna Wallach. 2024. Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Associ...

work page doi:10.18653/v1/2024.findings-acl.761 2024
[10]

Rachel Dorn. 2019. Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English. InProceedings of the Student Research Workshop Associated with RANLP 2019, Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, and Ivelina Nikolova (Eds.). INCOMA Ltd., Varna, Bulgaria, 16–20. doi:10.26615/issn.2603-2821.2019_003

work page doi:10.26615/issn.2603-2821.2019_003 2019
[11]

2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High

Penelope Eckert. 2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High. Wiley-Blackwell

2000
[12]

Siyuan Feng, Bence Mark Halpern, Olya Kudina, and Odette Scharenborg. 2024. Towards inclusive automatic speech recognition. Computer Speech & Language84 (March 2024), 101567. doi:10.1016/j.csl.2023.101567

work page doi:10.1016/j.csl.2023.101567 2024
[13]

Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying Bias in Automatic Speech Recognition. doi:10.48550/arXiv.2103.15122 arXiv:2103.15122 [cs, eess]

work page doi:10.48550/arxiv.2103.15122 2021
[14]

2003.Chicano English in Context

Carmen Fought. 2003.Chicano English in Context. Palgrave Macmillan. Google-Books-ID: AzskWeEaWrEC

2003
[15]

Powesland

Howard Giles and Peter F. Powesland. 1975.Speech Style and Social Evaluation. Academic Press

1975
[17]

Lisa J. Green. 2002.African American English: A Linguistic Introduction. Cambridge University Press, Cambridge. doi:10.1017/ CBO9780511800306

2002
[18]

Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing Community-Based Collaborative Design: Towards More Equitable Participatory Design Engagements.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 216:1–216:25. doi:10.1145/3359318 Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias FAccT ’26, June 25...

work page doi:10.1145/3359318 2019
[19]

Camille Harris, Chijioke Mgbahurike, Neha Kumar, and Diyi Yang. 2024. Modeling Gender and Dialect Bias in Automatic Speech Recognition. InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 15166–15184. doi:10.18653/...

work page doi:10.18653/v1/2024.findings-emnlp.890 2024
[20]

2012.The Managed Heart: Commercialization of Human Feeling(1 ed.)

Arlie Russell Hochschild. 2012.The Managed Heart: Commercialization of Human Feeling(1 ed.). University of California Press. https://www.jstor.org/stable/10.1525/j.ctt1pn9bk

work page doi:10.1525/j.ctt1pn9bk 2012
[21]

Ben Hutchinson, Celeste Rodríguez Louro, Glenys Collard, and Ned Cooper. 2025. Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 108–124. doi:10.1...

work page doi:10.1145/3715275.3732010 2025
[22]

Kirk, Mathieu Declerck, Ryan J

Neil W. Kirk, Mathieu Declerck, Ryan J. Kemp, and Vera Kempe. 2021. Language control in regional dialect speakers—monolingual by name, bilingual by nature?Bilingualism: Language and Cognition25, 3 (2021), 511–520. doi:10.1017/S1366728921000973

work page doi:10.1017/s1366728921000973 2021
[23]

2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana

Thomas Klingler. 2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana. LSU Press. Google-Books-ID: Q4B2kWdViU4C

2003
[24]

Rickford and Dan Jurafsky and Sharad Goel , title =

Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition.Proceedings of the National Academy of Sciences117, 14 (April 2020), 7684–7689. doi:10.1073/pnas.1915768117 Publisher: Proceedings of the National Acade...

work page doi:10.1073/pnas.1915768117 2020
[25]

1993.American Indian English

William Leap. 1993.American Indian English. University of Utah Press. Google-Books-ID: pL55AAAAIAAJ

1993
[26]

2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.)

Rosina Lippi-Green. 2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.). Routledge, London. doi:10.4324/9780203348802

work page doi:10.4324/9780203348802 2012
[27]

Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, and Yatharth Saraf. 2021. Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions. doi:10.48550/ arXiv.2111.09983 arXiv:2111.09983 [cs, eess]

work page arXiv 2021
[28]

Martin and Kevin Tang

Joshua L. Martin and Kevin Tang. 2020. Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual “be”. InInterspeech 2020. ISCA, 626–630. doi:10.21437/Interspeech.2020-2893

work page doi:10.21437/interspeech.2020-2893 2020
[29]

I don’t Think These Devices are Very Culturally Sensitive

Zion Mengesha, Courtney Heldreth, Michal Lahav, Juliana Sublewski, and Elyse Tuennerman. 2021. “I don’t Think These Devices are Very Culturally Sensitive. ”—Impact of Automated Speech Recognition Errors on African Americans.Frontiers in Artificial Intelligence4 (Nov. 2021). doi:10.3389/frai.2021.725911 Publisher: Frontiers

work page doi:10.3389/frai.2021.725911 2021
[30]

It’s not a representation of me

Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, and Avijit Ghosh. 2025. “It’s not a representation of me”: Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, Athens Greece, 228–245. doi:10.1145/3715275.3732018

work page doi:10.1145/3715275.3732018 2025
[31]

Ngueajio and Gloria Washington

Mikel K. Ngueajio and Gloria Washington. 2022. Hey ASR System! Why Aren’t You More Inclusive? Automatic Speech Recognition Systems’ Bias and Proposed Bias Mitigation Techniques. A Literature Review. Vol. 13518. 421–440. doi:10.1007/978-3-031-21707-4_30 arXiv:2211.09511 [cs, eess]

work page doi:10.1007/978-3-031-21707-4_30 2022
[32]

2018.Algorithms of Oppression: How Search Engines Reinforce Racism

Safiya Umoja Noble. 2018.Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press

2018
[33]

Totally Unnecessary

Bettina Pospisil, Thilo Sauter, Albert Treytl, Edith Huber, and Walter Seböck. 2022. "Totally Unnecessary" or "Simply Convenient" – About Users and Non-Users of Voice Assistants. In2022 15th International Conference on Human System Interaction (HSI). 1–7. doi:10.1109/HSI55341.2022.9869441 ISSN: 2158-2254

work page doi:10.1109/hsi55341.2022.9869441 2022
[34]

Kerri Prinos, Neal Patwari, and Cathleen A. Power. 2024. Speaking of accent: A content analysis of accent misconceptions in ASR research. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency. ACM, Rio de Janeiro Brazil, 1245–1254. doi:10.1145/ 3630106.3658969

work page arXiv 2024
[35]

Rickford and Sharese King

John R. Rickford and Sharese King. 2016. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond.Language92, 4 (2016), 948–988. https://muse.jhu.edu/pub/24/article/641206 Publisher: Linguistic Society of America

2016
[36]

Rachael Tatman. 2017. Gender and Dialect Bias in YouTube’s Automatic Captions. InProceedings of the First ACL Workshop on Ethics in Natural Language Processing, Dirk Hovy, Shannon Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube, and Hanna Wallach (Eds.). Association for Computational Linguistics, Valencia, Spain, 53–59. doi:10.18653/v1/W17-1606

work page doi:10.18653/v1/w17-1606 2017
[37]

Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand

Constanze C. Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand. 2019. Language control and Lexical Access in Diglossic Speech Production: Evidence from Variety Switching in Speakers of Swiss-German.Journal of Memory and Language107, 2019 (2019), 40–53. doi:10.1016/j.jml.2019.03.007

work page doi:10.1016/j.jml.2019.03.007 2019
[38]

Alicia Beckford Wassink, Cady Gansen, and Isabel Bartholomew. 2022. Uneven success: automatic speech recognition and ethnicity- related dialects.Speech Communication140 (May 2022), 50–70. doi:10.1016/j.specom.2022.03.009 FAccT ’26, June 25–28, 2026, Montreal, Canada Siyu Liang and Alicia Beckford Wassink A Questionnaire Design A.1 Complete Questionnaire T...

work page doi:10.1016/j.specom.2022.03.009 2022