pith. machine review for the scientific record. sign in

arxiv: 2604.21148 · v2 · submitted 2026-04-22 · 💻 cs.CL

Recognition: unknown

"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:36 UTC · model grok-4.3

classification 💻 cs.CL
keywords automatic speech recognitionASR biasuser experienceemotional impactdialect variationalgorithmic fairnessqualitative analysislinguistic diversity
0
0 comments X

The pith

ASR bias evaluations based on accuracy alone overlook the emotional labor, self-monitoring, and internalized inadequacy users experience with non-standard dialects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports user experience studies with speakers of distinct U.S. English dialects in four locations to examine how ASR system failures shape lived experiences. Participants describe technologies that ignore their cultural backgrounds, forcing repeated adjustments such as code-switching and hyper-articulation just to achieve basic function. These interactions produce frustration and feelings of inadequacy that persist even when users know the systems were not built for their varieties. Standard fairness checks that count only error rates therefore miss the cognitive burden of constant self-monitoring and the psychological costs of repeated technological rejection.

Core claim

Qualitative analysis of open-ended narratives from participants across Atlanta, Gulf Coast, Miami Beach, and Tucson shows that ASR systems encode particular varieties as standard while marginalizing others. Users perform extensive invisible labor including code-switching, hyper-articulation, and emotional management to compensate, yet still internalize failures as personal shortcomings despite recognizing that the systems were not designed for them. Algorithmic fairness assessments based on accuracy metrics alone therefore miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological

What carries the argument

Qualitative analysis of user narratives from dialect-specific UX studies that uncovers invisible adaptation labor and internalized harm beyond error counts.

If this is right

  • Fairness evaluations of ASR systems must track emotional labor and psychological impact in addition to word-error rates.
  • Speech technology design should treat diverse language varieties as legitimate targets rather than sources of user adaptation.
  • Users' expressed willingness to contribute data and feedback creates opportunities for participatory improvement that values their linguistic knowledge.
  • Awareness that systems are biased does not prevent users from internalizing failures as personal inadequacy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Voice interfaces in other domains such as virtual assistants may impose comparable unseen adaptation costs on the same speaker groups.
  • Incorporating measures of user effort and cultural fit during model development could reduce the need for compensatory behaviors like code-switching.
  • Qualitative methods that surface non-quantifiable harms could be extended to fairness audits of additional language technologies.

Load-bearing premise

That participants' open-ended interview responses accurately reflect the full emotional and cognitive costs without distortion from the interview context, self-reporting biases, or researcher interpretation.

What would settle it

A controlled comparison finding no difference in reported emotional labor, self-monitoring effort, or feelings of inadequacy between standard-dialect and non-standard-dialect ASR users when error rates are held constant would falsify the claim that accuracy metrics miss these harms.

read the original abstract

Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper presents findings from qualitative user experience studies conducted across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) with speakers of distinct English dialect communities. It claims that ASR bias evaluations relying solely on accuracy metrics overlook critical dimensions of harm, including the emotional labor of repeated technological rejection, the cognitive burden of constant self-monitoring and code-switching, and the psychological toll of internalized feelings of inadequacy in one's native language variety, despite participants' awareness that systems were not designed for them and their willingness to contribute to improvements.

Significance. If the results hold, the work meaningfully advances ASR fairness research by providing empirical evidence from participant narratives that accuracy-only assessments are incomplete, highlighting unmeasured experiential costs. The manuscript earns credit for supplying participant demographics, interview protocols, thematic coding processes, and direct quotes that ground the interpretive claims without internal contradictions or unsupported leaps.

minor comments (2)
  1. Abstract: The summary of findings would be strengthened by briefly noting sample sizes and the high-level structure of the thematic analysis (e.g., number of participants and coding steps), which are detailed in the full text but absent from the abstract.
  2. Section on methodology: Clarify whether inter-coder reliability metrics or member-checking procedures were used in the thematic analysis to further address potential interpretive bias, even if the current description is already transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of its significance in advancing ASR fairness research beyond accuracy metrics, and the recommendation for minor revision. The assessment that our qualitative findings are grounded in participant narratives without unsupported claims is appreciated.

Circularity Check

0 steps flagged

No significant circularity in qualitative empirical study

full rationale

This is a qualitative user-experience study drawing on interview narratives from dialect speakers across four U.S. sites. It contains no equations, fitted parameters, predictions, or derivation chains. All load-bearing claims rest on direct participant quotes, thematic coding descriptions, and reported patterns, which are independent of any self-citation or internal redefinition. The work is therefore self-contained as straightforward empirical reporting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative empirical study with no mathematical models, free parameters, formal axioms, or postulated entities; it relies on standard social science methods of data collection and thematic interpretation.

pith-pipeline@v0.9.0 · 5567 in / 1135 out tokens · 21328 ms · 2026-05-09T23:36:15.548051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 24 canonical work pages

  1. [1]

    2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy

    April Baker-Bell. 2020.Linguistic Justice: Black Language, Literacy, Identity, and Pedagogy. Routledge, New York. doi:10.4324/ 9781315147383

  2. [2]

    Allan Bell. 1984. Language Style as Audience Design.Language in Society13 (1984), 145–204

  3. [3]

    2019.Race After Technology: Abolitionist Tools for the New Jim Code

    Ruha Benjamin. 2019.Race After Technology: Abolitionist Tools for the New Jim Code. Wiley. Google-Books-ID: nPy9uwEACAAJ

  4. [4]

    Barnini Bhattacharyya and Jennifer L. Berdahl. 2023. Do you see me? An inductive examination of differences between women of color’s experiences of and responses to invisibility at work.Journal of Applied Psychology108, 7 (2023), 1073–1095. doi:10.1037/apl0001072 Place: US Publisher: American Psychological Association

  5. [5]

    Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. Demographic Dialectal Variation in Social Media: A Case Study of African- American English. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh, and Xavier Carreras (Eds.). Association for Computational Linguistics, Austin, Texas, 1119–1130. d...

  6. [6]

    June Choe, Yiran Chen, May Pik Yu Chan, Aini Li, Xin Gao, and Nicole Holliday. 2022. Language-specific Effects on Automatic Speech Recognition Errors for World Englishes. InProceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 7177–7186. https://acl...

  7. [7]

    Wellman, Susan Carey, Lila Gleitman, Elissa L

    Sasha Costanza-Chock. 2020.Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press. doi:10.7551/mitpress/ 12255.001.0001

  8. [8]

    2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics

    Nikolas Coupland. 2001.Language, Situation and the Relational Self: theorizing dialect-style in sociolinguistics. University Press, Chapter 11

  9. [9]

    Jay Cunningham, Su Lin Blodgett, Michael Madaio, Hal Daumé Iii, Christina Harrington, and Hanna Wallach. 2024. Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Associ...

  10. [10]

    Rachel Dorn. 2019. Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English. InProceedings of the Student Research Workshop Associated with RANLP 2019, Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, and Ivelina Nikolova (Eds.). INCOMA Ltd., Varna, Bulgaria, 16–20. doi:10.26615/issn.2603-2821.2019_003

  11. [11]

    2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High

    Penelope Eckert. 2000.Language Variation as Social Practice: The Linguistic Construction of Identity in Belten High. Wiley-Blackwell

  12. [12]

    Siyuan Feng, Bence Mark Halpern, Olya Kudina, and Odette Scharenborg. 2024. Towards inclusive automatic speech recognition. Computer Speech & Language84 (March 2024), 101567. doi:10.1016/j.csl.2023.101567

  13. [13]

    Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying Bias in Automatic Speech Recognition. doi:10.48550/arXiv.2103.15122 arXiv:2103.15122 [cs, eess]

  14. [14]

    2003.Chicano English in Context

    Carmen Fought. 2003.Chicano English in Context. Palgrave Macmillan. Google-Books-ID: AzskWeEaWrEC

  15. [15]

    Powesland

    Howard Giles and Peter F. Powesland. 1975.Speech Style and Social Evaluation. Academic Press

  16. [17]

    Lisa J. Green. 2002.African American English: A Linguistic Introduction. Cambridge University Press, Cambridge. doi:10.1017/ CBO9780511800306

  17. [18]

    Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing Community-Based Collaborative Design: Towards More Equitable Participatory Design Engagements.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 216:1–216:25. doi:10.1145/3359318 Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias FAccT ’26, June 25...

  18. [19]

    Camille Harris, Chijioke Mgbahurike, Neha Kumar, and Diyi Yang. 2024. Modeling Gender and Dialect Bias in Automatic Speech Recognition. InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 15166–15184. doi:10.18653/...

  19. [20]

    2012.The Managed Heart: Commercialization of Human Feeling(1 ed.)

    Arlie Russell Hochschild. 2012.The Managed Heart: Commercialization of Human Feeling(1 ed.). University of California Press. https://www.jstor.org/stable/10.1525/j.ctt1pn9bk

  20. [21]

    Ben Hutchinson, Celeste Rodríguez Louro, Glenys Collard, and Ned Cooper. 2025. Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 108–124. doi:10.1...

  21. [22]

    Kirk, Mathieu Declerck, Ryan J

    Neil W. Kirk, Mathieu Declerck, Ryan J. Kemp, and Vera Kempe. 2021. Language control in regional dialect speakers—monolingual by name, bilingual by nature?Bilingualism: Language and Cognition25, 3 (2021), 511–520. doi:10.1017/S1366728921000973

  22. [23]

    2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana

    Thomas Klingler. 2003.If I Could Turn My Tongue Like That: The Creole Language of Pointe Coupee Parish, Louisiana. LSU Press. Google-Books-ID: Q4B2kWdViU4C

  23. [24]

    Rickford and Dan Jurafsky and Sharad Goel , title =

    Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition.Proceedings of the National Academy of Sciences117, 14 (April 2020), 7684–7689. doi:10.1073/pnas.1915768117 Publisher: Proceedings of the National Acade...

  24. [25]

    1993.American Indian English

    William Leap. 1993.American Indian English. University of Utah Press. Google-Books-ID: pL55AAAAIAAJ

  25. [26]

    2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.)

    Rosina Lippi-Green. 2012.English with an Accent: Language, Ideology and Discrimination in the United States(2 ed.). Routledge, London. doi:10.4324/9780203348802

  26. [27]

    Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, and Yatharth Saraf. 2021. Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions. doi:10.48550/ arXiv.2111.09983 arXiv:2111.09983 [cs, eess]

  27. [28]

    Martin and Kevin Tang

    Joshua L. Martin and Kevin Tang. 2020. Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual “be”. InInterspeech 2020. ISCA, 626–630. doi:10.21437/Interspeech.2020-2893

  28. [29]

    I don’t Think These Devices are Very Culturally Sensitive

    Zion Mengesha, Courtney Heldreth, Michal Lahav, Juliana Sublewski, and Elyse Tuennerman. 2021. “I don’t Think These Devices are Very Culturally Sensitive. ”—Impact of Automated Speech Recognition Errors on African Americans.Frontiers in Artificial Intelligence4 (Nov. 2021). doi:10.3389/frai.2021.725911 Publisher: Frontiers

  29. [30]

    It’s not a representation of me

    Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, and Avijit Ghosh. 2025. “It’s not a representation of me”: Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, Athens Greece, 228–245. doi:10.1145/3715275.3732018

  30. [31]

    Ngueajio and Gloria Washington

    Mikel K. Ngueajio and Gloria Washington. 2022. Hey ASR System! Why Aren’t You More Inclusive? Automatic Speech Recognition Systems’ Bias and Proposed Bias Mitigation Techniques. A Literature Review. Vol. 13518. 421–440. doi:10.1007/978-3-031-21707-4_30 arXiv:2211.09511 [cs, eess]

  31. [32]

    2018.Algorithms of Oppression: How Search Engines Reinforce Racism

    Safiya Umoja Noble. 2018.Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press

  32. [33]

    Totally Unnecessary

    Bettina Pospisil, Thilo Sauter, Albert Treytl, Edith Huber, and Walter Seböck. 2022. "Totally Unnecessary" or "Simply Convenient" – About Users and Non-Users of Voice Assistants. In2022 15th International Conference on Human System Interaction (HSI). 1–7. doi:10.1109/HSI55341.2022.9869441 ISSN: 2158-2254

  33. [34]

    Kerri Prinos, Neal Patwari, and Cathleen A. Power. 2024. Speaking of accent: A content analysis of accent misconceptions in ASR research. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency. ACM, Rio de Janeiro Brazil, 1245–1254. doi:10.1145/ 3630106.3658969

  34. [35]

    Rickford and Sharese King

    John R. Rickford and Sharese King. 2016. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond.Language92, 4 (2016), 948–988. https://muse.jhu.edu/pub/24/article/641206 Publisher: Linguistic Society of America

  35. [36]

    Rachael Tatman. 2017. Gender and Dialect Bias in YouTube’s Automatic Captions. InProceedings of the First ACL Workshop on Ethics in Natural Language Processing, Dirk Hovy, Shannon Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube, and Hanna Wallach (Eds.). Association for Computational Linguistics, Valencia, Spain, 53–59. doi:10.18653/v1/W17-1606

  36. [37]

    Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand

    Constanze C. Vorwerg, Sumanghalyah Suntharam, and Marie-Anne Morand. 2019. Language control and Lexical Access in Diglossic Speech Production: Evidence from Variety Switching in Speakers of Swiss-German.Journal of Memory and Language107, 2019 (2019), 40–53. doi:10.1016/j.jml.2019.03.007

  37. [38]

    Alicia Beckford Wassink, Cady Gansen, and Isabel Bartholomew. 2022. Uneven success: automatic speech recognition and ethnicity- related dialects.Speech Communication140 (May 2022), 50–70. doi:10.1016/j.specom.2022.03.009 FAccT ’26, June 25–28, 2026, Montreal, Canada Siyu Liang and Alicia Beckford Wassink A Questionnaire Design A.1 Complete Questionnaire T...