pith. sign in

arxiv: 2605.21035 · v1 · pith:LD2B72GTnew · submitted 2026-05-20 · 💻 cs.HC

The Quiet Path from Seemingly Minor Design Errors to Workplace AI Incidents

Pith reviewed 2026-05-21 03:55 UTC · model grok-4.3

classification 💻 cs.HC
keywords workplace AIincident reportsdesign misalignmentworker preferencesdeveloper prioritiesAI traitstask automationHCI
0
0 comments X

The pith

Analysis of 1,524 workplace AI incident reports finds that 83% stem from misalignments where workers need precise and personal systems but receive basic ones optimized for speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines reports of AI-related workplace incidents and compares the actual traits of the systems involved against what workers say they need for the same tasks. It establishes that the large majority of incidents arise because developers build systems that prioritize efficiency and generality while workers require precision, insight, and personalization. A sympathetic reader would care because this gap points to a concrete, addressable cause of harm that erodes worker agency and productivity over time. The study also tracks how the dominant mismatched traits have shifted with the arrival of generative AI.

Core claim

We analyzed 1,524 reports of incidents in which AI systems were used to perform 171 occupational tasks across 12 industry sectors. Using an LLM-as-an-expert approach, we extracted the main traits of the AI systems involved in those incidents using an established framework of twelve traits. We then compared them with the traits that 202 workers highly familiar with those tasks would have preferred. We found that as many as 83% of workplace incidents stem from worker-AI misalignments. In most cases, workers wanted systems that are precise, insightful, or personal, but instead received systems that are basic, simple, or general. We also compared the traits causing the incidents with the traits

What carries the argument

Comparison of twelve AI system traits extracted from incident reports against the traits preferred by workers and by developers.

If this is right

  • Workplace AI incidents are likely to persist without design corrections that better align systems with workers' needs.
  • The mismatch causes an invisible erosion of worker agency.
  • Organizational productivity declines as a result of these repeated incidents.
  • Developers' overfocus on efficiency and speed accounts for 74% of the misalignments, especially in people-facing occupations such as human resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations could reduce incidents by involving workers earlier in specifying the desired traits of AI tools for their tasks.
  • The same mismatch pattern may appear in consumer or public-sector AI uses and could be studied with similar incident-report methods.
  • Monitoring shifts in dominant AI traits over time, such as the rise of imaginative systems after generative AI arrived, offers a way to anticipate new incident categories.

Load-bearing premise

The LLM extraction of the twelve traits from incident reports accurately identifies the design features that caused the incidents without bias, and the surveyed workers' stated preferences are representative of what would have prevented those specific incidents.

What would settle it

Redesign several AI systems for the same tasks to match the worker-preferred traits identified in the study, then measure whether the rate of reported incidents drops compared with control systems left unchanged.

Figures

Figures reproduced from arXiv: 2605.21035 by Andr\'es Gvirtz, Daniele Quercia, Julia De Miguel Vel\'azquez, Sanja \v{S}\'cepanovi\'c.

Figure 1
Figure 1. Figure 1: Overview of our research design. We identify incidents caused by AI at work, and gather worker and developer preferences for how AI should be for a set of tasks from the O*NET (a standardized job task database) (Step 1); we identify the tasks exposed to the AI systems in the incidents (Step 2); we identify the subset of those tasks caused by misaligned AI (Step 3); and, we grouped those tasks by whether th… view at source ↗
Figure 2
Figure 2. Figure 2: Percentage of incidents caused by misaligned AI across sectors. By misaligned AI, we mean an AI system that performed a task with traits that differed from the traits workers would have preferred for that system. We count an incident as caused by misaligned AI, if at least one task occurrence in the incident was misaligned and this contributed to the incident. The bars represent the percentage of incidents… view at source ↗
Figure 3
Figure 3. Figure 3: Percentage of task occurrences exhibiting a given misaligned AI trait, out of all task occurrences with a pair of misaligned AI traits that resulted in an incident. A task occurrence is counted each time a task from our set of 93 tasks appears in one of the 214 incidents; since an incident may involve multiple tasks, the same task can be counted more than once. For each pair of AI traits, the two bars show… view at source ↗
Figure 4
Figure 4. Figure 4: Top three most frequent reasons for misalignment in each sector. By misaligned AI, we mean an AI system that performed a task with traits that differed from the traits workers would have preferred for that system (first two rows in the graph). The numbers in the black circles represent the number of the sector’s task occurrences where the AI exhibited a given trait and the worker preferred the opposite tra… view at source ↗
Figure 5
Figure 5. Figure 5: The tasks with a higher fraction of incidents caused by misaligned AI. For each task, we show the two most prevalent traits where an AI system performed a task with a trait that differed from the trait the workers would have preferred for that system. We report the percentage of incidents associated with a given task in which the AI showed a given trait and was misaligned, out of all incidents linked to th… view at source ↗
Figure 6
Figure 6. Figure 6: Percentage of task occurrences involving incidents of misaligned AI (solid line), or incidents of not misaligned AI (dashed line) over time. Each data point represents the percentage of task occurrences classified as specified in the label within a year. We merged 2014-2020 as there were significantly fewer data points. Vertical lines mark milestones in AI research and deployment for context [64]. Misalign… view at source ↗
Figure 7
Figure 7. Figure 7: Percentage of task occurrences in incidents with AI misalignment attributable to developers (black bar) or to other causes (gray bar), out of all task occurrences with AI misalignment within each sector (total in parentheses). We attribute the responsibility of a task occurrence with AI misalignment to the developer, if the developer designed the AI to perform a task with at least one trait that differed f… view at source ↗
Figure 8
Figure 8. Figure 8: Top three most frequent misalignment attributable to developers in each sector. We attribute the responsibility of a task occurrence with AI misalignment to the developer, when the developer designed the AI system to perform a task with traits that differed from the traits workers would have preferred for that system (first two rows in the figure). The numbers in the black circles represent the number of t… view at source ↗
Figure 9
Figure 9. Figure 9: Percentage of task occurrences involving incidents of misaligned AI for each pair of traits over time. Each data point represents the percentage of task occurrences classified as specified in the label within a year. We merged 2014-2020 as there were significantly fewer data points. For each pair of traits, the first trait listed (e.g., complex) is represented by the solid line, and the second trait listed… view at source ↗
read the original abstract

Recent human-computer interaction (HCI) research has revealed a widespread misalignment between how developers design workplace artificial intelligence (AI) systems, and what workers actually need from them. Yet, little research has examined the effects of this gap, or how it may cause harm. We analyzed 1,524 reports of incidents in which AI systems were used to perform 171 occupational tasks across 12 industry sectors. Using an Large Language Model (LLM)-as-an-expert approach, we extracted the main traits of the AI systems involved in those incidents using an established framework of twelve traits. We then compared them with the traits that 202 workers highly familiar with those tasks would have preferred. We found that as many as 83\% of workplace incidents stem from worker-AI misalignments. In most cases, workers wanted systems that are precise, insightful, or personal, but instead received systems that are basic, simple, or general. Over the years, fast AI caused a considerable number of incidents, yet these declined, and imaginative AI, with the mass introduction of generative AI, started to cause incidents. We also compared the traits causing the incidents with the traits that 197 developers building AI systems for those tasks would have preferred. If the traits causing the incidents were the same as those designed by developers, then developers may be responsible for those incidents. We found that 74\% of task misalignments could be attributed to developers who tended to overfocus on efficiency and speed, especially for systems performing tasks in people-facing occupations such as those in the human resources sector. Our results call for design interventions that better align AI development with workers' needs, as without such corrections, workplace AI incidents are likely to persist, causing the invisible erosion of worker agency and organizational productivity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes 1,524 reports of workplace AI incidents across 171 occupational tasks in 12 sectors. Using an LLM-as-an-expert method to extract twelve traits from the incident descriptions, it compares these extracted traits against preferences stated by 202 workers and 197 developers. The central claims are that up to 83% of incidents arise from worker-AI misalignments (workers preferring precise/insightful/personal traits but receiving basic/simple/general ones) and that 74% of task misalignments can be attributed to developers who over-prioritize efficiency and speed.

Significance. If the core percentages hold after validation, the work supplies a large-scale empirical mapping from specific design-trait gaps to documented incidents, strengthening the case for worker-centered AI design interventions in HCI. The scale of the incident corpus and the direct comparison to both worker and developer preferences are notable strengths that could inform practical guidelines.

major comments (2)
  1. [Methods] Methods (LLM-as-an-expert trait extraction): The description of the LLM prompt and trait-labeling procedure supplies no human validation, inter-rater reliability statistics, prompt-sensitivity tests, or ablation on how ambiguous reports were resolved. Because the 83% misalignment figure and the subsequent 74% developer-attribution claim rest entirely on these labels, any systematic bias in the extraction directly undermines both headline statistics.
  2. [Results] Results (83% and 74% claims): No error bars, exclusion criteria, or robustness checks are reported for the misalignment percentages or the developer-attribution step. Without these, it is impossible to determine whether the reported proportions are stable under reasonable variations in trait definitions or coding decisions.
minor comments (2)
  1. [Abstract] Abstract: The statement that 'fast AI caused a considerable number of incidents, yet these declined, and imaginative AI... started to cause incidents' would be clearer with explicit time periods or supporting counts from the dataset.
  2. [Throughout] Notation: The twelve-trait framework is referenced repeatedly; a brief table or appendix listing the exact trait definitions and their opposites would improve readability for readers unfamiliar with the source framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important opportunities to strengthen the methodological transparency and statistical robustness of our findings. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Methods] Methods (LLM-as-an-expert trait extraction): The description of the LLM prompt and trait-labeling procedure supplies no human validation, inter-rater reliability statistics, prompt-sensitivity tests, or ablation on how ambiguous reports were resolved. Because the 83% misalignment figure and the subsequent 74% developer-attribution claim rest entirely on these labels, any systematic bias in the extraction directly undermines both headline statistics.

    Authors: We agree that the current description of the LLM-as-an-expert procedure lacks explicit validation steps. In the revised manuscript we will add a dedicated validation subsection that reports: (1) a human validation study on a random sample of 150 incident reports independently coded by two domain experts, with inter-rater reliability measured by Cohen’s kappa; (2) prompt-sensitivity experiments in which we re-extract traits using three alternative prompt phrasings and quantify label stability; and (3) a clear protocol for resolving ambiguous reports, including the use of multiple LLM runs followed by majority vote. These additions will allow readers to assess potential systematic bias in the trait labels that underpin the 83 % and 74 % figures. revision: yes

  2. Referee: [Results] Results (83% and 74% claims): No error bars, exclusion criteria, or robustness checks are reported for the misalignment percentages or the developer-attribution step. Without these, it is impossible to determine whether the reported proportions are stable under reasonable variations in trait definitions or coding decisions.

    Authors: We acknowledge that the reported percentages currently lack accompanying uncertainty estimates and sensitivity analyses. In the revision we will: (1) compute and report 95 % bootstrap confidence intervals for both the 83 % misalignment rate and the 74 % developer-attribution rate; (2) explicitly state the exclusion criteria applied to the 1,524 reports (e.g., insufficient textual detail for reliable trait extraction); and (3) present robustness checks that recompute the key percentages after perturbing trait boundary definitions and after excluding the most ambiguous 10 % of reports. These additions will demonstrate the stability of the headline statistics under reasonable variations in coding decisions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of LLM-extracted traits against independent surveys

full rationale

The paper conducts an empirical analysis of 1,524 incident reports by applying an LLM to extract traits from an established twelve-trait framework, then directly compares those extracted traits to separate survey responses from 202 workers and 197 developers. No equations, fitted parameters, or derivations are described that reduce to the inputs by construction. The 83% and 74% figures arise from counting mismatches between the two independently collected datasets rather than from any self-definitional loop or renamed fit. Self-citations, if present, are not load-bearing for the central claims, which rest on the external survey data and incident corpus.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis depends on the validity of an external twelve-trait framework and on the assumption that incident reports and survey responses can be directly compared without major selection or reporting biases.

axioms (1)
  • domain assumption The established framework of twelve traits accurately and exhaustively captures the design characteristics relevant to workplace AI incidents.
    Invoked when the LLM extracts traits from reports and when those traits are compared to worker and developer preferences.

pith-pipeline@v0.9.0 · 5879 in / 1397 out tokens · 35072 ms · 2026-05-21T03:55:06.884064+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · 3 internal anchors

  1. [1]

    Daron Acemoglu, David Autor, and Simon Johnson. 2023. Can we have pro-worker AI.Choosing a path(2023)

  2. [2]

    AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC). 2026. AIAAIC Repository of AI, Algorithmic, and Automation Incidents and Controversies. https://www.aiaaic.org/. Accessed: 2026-01-06

  3. [3]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–13. doi:10....

  4. [4]

    Elske Ammenwerth, Carola Iller, and Cornelia Mahler. 2006. IT-adoption and the interaction of task, technology and individuals: a fit framework and a case study.BMC Medical Informatics and Decision Making6, 1 (Jan. 2006), 3. doi:10.1186/1472-6947-6-3

  5. [5]

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861(2021)

  6. [6]

    task approach

    David H. Autor. 2013. The “task approach” to labor markets: an overview.Journal for Labour Market Research46, 3 (Sept. 2013), 185–199. doi:10.1007/s12651-013-0128-z

  7. [7]

    Ezra Awumey, Sauvik Das, and Jodi Forlizzi. 2024. A systematic review of biometric monitoring in the workplace: analyzing socio- technical harms in development, deployment and use. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 920–932

  8. [8]

    Ezra Awumey, Sauvik Das, and Jodi Forlizzi. 2024. A Systematic Review of Biometric Monitoring in the Workplace: Analyzing Socio-technical Harms in Development, Deployment and Use. InThe 2024 ACM Conference on Fairness Accountability and Transparency. ACM, Rio de Janeiro Brazil, 920–932. doi:10.1145/3630106.3658945

  9. [9]

    Jascha Bareis and Christian Katzenbach. 2022. Talking AI into being: The narratives and imaginaries of national AI strategies and their performative politics.Science, Technology, & Human Values47, 5 (2022), 855–881

  10. [10]

    Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, 610–623. doi:10.1145/3442188.3445922

  11. [11]

    Federico Bianchi, Amanda Cercas Curry, and Dirk Hovy. 2023. Artificial intelligence accidents waiting to happen?Journal of Artificial Intelligence Research76 (2023), 193–199

  12. [12]

    Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L

    Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L. Dancy

  13. [13]

    InProceedings of the 2022 ACM conference on fairness, accountability, and transparency

    The forgotten margins of AI ethics. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. The Quiet Path from Seemingly Minor Design Errors to Workplace AI Incidents FAccT ’26, June 25–28, 2026, Montreal, QC, Canada 948–958

  14. [14]

    Kim M Blankenship, Samuel R Friedman, Shari Dworkin, and Joanne E Mantell. 2006. Structural interventions: concepts, challenges and opportunities for research.Journal of Urban Health83, 1 (2006), 59–72

  15. [15]

    Edyta Bogucka, Sanja Šćepanović, and Daniele Quercia. 2024. Atlas of AI Risks: Enhancing Public Understanding of AI Risks.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing12 (Oct. 2024), 33–43. doi:10.1609/hcomp.v12i1.31598

  16. [16]

    Edyta Paulina Bogucka, Marios Constantinides, Julia De Miguel Velazquez, Sanja Scepanovic, Daniele Quercia, and Andrés Gvirtz. 2024. The Atlas of AI Incidents in Mobile Computing: Visualizing the Risks and Benefits of AI Gone Mobile. InAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction(Melbourne, VIC, Australia)(...

  17. [17]

    Margarita Boyarskaya, Alexandra Olteanu, and Kate Crawford. 2020. Overcoming failures of imagination in AI infused system development and deployment.arXiv preprint arXiv:2011.13416(2020)

  18. [18]

    Michelle Brachman, Amina El-Ashry, Casey Dugan, and Werner Geyer. 2025. Current and future use of large language models for knowledge work.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–24

  19. [19]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

  20. [20]

    Erik Brynjolfsson, Danielle Li, and Lindsey Raymond. 2025. Generative AI at work.The Quarterly Journal of Economics140, 2 (2025), 889–942

  21. [21]

    Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. 2025. Social sycophancy: A broader understanding of llm sycophancy.arXiv preprint arXiv:2505.13995(2025)

  22. [22]

    Cliffe Dekker Hofmeyr. 2025. Another episode of fabricated citations, real repercussions: South African courts show no tolerance for AI-hallucinated cases. https://www.cliffedekkerhofmeyr.com/en/news/publications/2025/Practice/Employment-Law/combined- employment-and-knowledge-management-alert-4-july-Another-episode-of-fabricated-citations-real-repercussio...

  23. [23]

    Jacob Cohen. 1960. A coefficient of agreement for nominal scales.Educational and psychological measurement20, 1 (1960), 37–46

  24. [24]

    Julia De Miguel Velázquez, Sanja Šćepanović, Andrés Gvirtz, and Daniele Quercia. 2024. Decoding Real-World Artificial Intelligence Incidents.Computer57, 11 (2024), 71–81. doi:10.1109/MC.2024.3432492

  25. [25]

    2023.Data feminism

    Catherine D’ignazio and Lauren F Klein. 2023.Data feminism. MIT press

  26. [26]

    Mengchen Dong, Jane Rebecca Conway, Jean-François Bonnefon, Azim Shariff, and Iyad Rahwan. 2024. Fears about artificial intelligence across 20 countries and six domains of application.American Psychologist(2024)

  27. [27]

    Mengchen Dong, Jane Rebecca Conway, Jean-François Bonnefon, Azim Shariff, and Iyad Rahwan. 2024. Fears about artificial intelligence across 20 countries and six domains of application.American Psychologist(2024). doi:10.1037/amp0001454 Place: US Publisher: American Psychological Association

  28. [28]

    Madeleine Clare Elish. 2019. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.Engaging Science, Technology, and Society5 (March 2019), 40–60. doi:10.17351/ests2019.260

  29. [29]

    Madeleine Clare Elish and Danah Boyd. 2018. Situating methods in the magic of Big Data and AI.Communication monographs85, 1 (2018), 57–80

  30. [30]

    Michael Feffer, Nikolas Martelaro, and Hoda Heidari. 2023. The AI Incident Database as an Educational Tool to Raise Awareness of AI Harms: A Classroom Exploration of Efficacy, Limitations, & Future Improvements. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23). Association for Computing M...

  31. [31]

    Diana E Forsythe. 1993. Engineering knowledge: The construction of knowledge in artificial intelligence.Social studies of science23, 3 (1993), 445–477

  32. [32]

    Fox, Vera Khovanskaya, Clara Crivellaro, Niloufar Salehi, Lynn Dombrowski, Chinmay Kulkarni, Lilly Irani, and Jodi Forlizzi

    Sarah E. Fox, Vera Khovanskaya, Clara Crivellaro, Niloufar Salehi, Lynn Dombrowski, Chinmay Kulkarni, Lilly Irani, and Jodi Forlizzi

  33. [33]

    InExtended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’20)

    Worker-Centered Design: Expanding HCI Methods for Supporting Labor. InExtended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. doi:10.1145/3334480.3375157

  34. [34]

    Fox, Samantha Shorey, Esther Y

    Sarah E. Fox, Samantha Shorey, Esther Y. Kang, Dominique Montiel Valle, and Estefania Rodriguez. 2023. Patchwork: The Hidden, Human Labor of AI Integration within Essential Work.Proc. ACM Hum.-Comput. Interact.7, CSCW1 (April 2023), 81:1–81:20. doi:10.1145/3579514

  35. [35]

    Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin La...

  36. [36]

    Anna Gausen, Bhaskar Mitra, and Siân Lindley. 2024. A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency. ACM, Rio de Janeiro Brazil, 207–220. doi:10.1145/3630106.3658900

  37. [37]

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford

  38. [38]

    URL https://cacm.acm.org/research/ datasheets-for-datasets/

    Datasheets for datasets.Commun. ACM64, 12 (Dec. 2021), 86–92. doi:10.1145/3458723

  39. [39]

    Tarleton Gillespie, Ryland Shaw, Mary L Gray, and Jina Suh. 2026. AI Red-Teaming Is a Sociotechnical Problem.Commun. ACM(2026)

  40. [40]

    Delaram Golpayegani, Harshvardhan J Pandit, and Dave Lewis. 2023. To be high-risk, or not to be—semantic specifications and implications of the ai act’s high-risk ai applications and harmonised standards. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 905–915

  41. [41]

    As an individual, I suppose you can’t really do much

    Sinem Görücü, Yuheng Ren, Gabrielle Samuel, and Georgia Panagiotidou. 2025. " As an individual, I suppose you can’t really do much": Environmental Sustainability Perceptions of Machine Learning Practitioners. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 1312–1324

  42. [42]

    Eileen Guo, Jeroen van Raalte, Justin-Casimir Braun, Gabriel Geiger, Amanda Silverman, Eva Constantaras, Melissa Heikkilä, Tahmeed Shafiq, Alice Milliken, Crofton Black, and Daniel Howden. 2025. The Limits of Ethical AI. https://www.lighthousereports.com/ investigation/the-limits-of-ethical-ai/. Accessed: 2026-01-14

  43. [43]

    Sacha Gutierrez, Dennis Nguyen, and Karin van Es. 2025. Tool, companion or a catalyst force? Exploring sociotechnical imaginaries Within AI livestreams’ communities of practice.Big Data & Society12, 4 (Dec. 2025), 20539517251381663. doi:10.1177/20539517251381663 Publisher: SAGE Publications Ltd

  44. [44]

    Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, et al. 2025. Which Economic Tasks are Performed with AI.Evidence from Millions of Claude Conversations(2025)

  45. [45]

    Don’t Forget the Teachers

    Emma Harvey, Allison Koenecke, and Rene F. Kizilcec. 2025. "Don’t Forget the Teachers": Towards an Educator-Centered Understanding of Harms from Large Language Models in Education. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–19. doi:10.1145/3706598.3713210

  46. [46]

    Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–16

  47. [47]

    Alexis Shore Ingber and Nazanin Andalibi. 2025. Emotion AI in Job Interviews: Injustice, Emotional Labor, Identity, and Privacy. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, Athens Greece, 1–17. doi:10.1145/3715275. 3732002

  48. [48]

    and McCoy, Thomas H

    Maia Jacobs, Melanie F. Pradier, Thomas H. McCoy, Roy H. Perlis, Finale Doshi-Velez, and Krzysztof Z. Gajos. 2021. How machine- learning recommendations influence clinician treatment selections: the example of antidepressant selection.Translational Psychiatry 11, 1 (2021), 108. doi:10.1038/s41398-021-01224-x

  49. [49]

    Maurice Jakesch, Zana Buçinca, Saleema Amershi, and Alexandra Olteanu. 2022. How Different Groups Prioritize Ethical Values for Responsible AI. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 310–323. doi:10.1145/3531146.3533097

  50. [50]

    Stefan Jänicke, Greta Franzini, Muhammad Faisal Cheema, Gerik Scheuermann, et al. 2015. On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges.EuroVis (STARs)2015 (2015), 83–103

  51. [51]

    Jiaming Ji, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Borong Zhang, Donghai Hong, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Lukas Vierling, Zhaowei Zhang, Fanzhi Zeng, Juntao Dai, Xuehai Pan, Hua Xu, Aidan O’Gara, Kwan Ng, Brian Tse, Jie Fu, Stephen Mcaleer, Yanfeng Wang, Mingchuan Yang, Yunhuai Liu, Yizhou Wang, Song-Chun Zhu, Yike Guo, Yaodong Yang, a...

  52. [52]

    Mackenzie Jorgensen, Hannah Richert, Elizabeth Black, Natalia Criado, and Jose Such. 2023. Not so fair: The impact of presumably fair machine learning models. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 297–311

  53. [53]

    Nadia Karizat, Alexandra H Vinson, Shobita Parthasarathy, and Nazanin Andalibi. 2024. Patent applications as glimpses into the sociotechnical imaginary: ethical speculation on the imagined futures of emotion AI for mental health monitoring and detection. Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–43

  54. [54]

    Anna Kawakami, Jordan Taylor, Sarah Fox, Haiyi Zhu, and Kenneth Holstein. 2026. AI failure loops in devalued work: The confluence of overconfidence in AI and underconfidence in worker expertise.Big Data & Society13, 1 (2026), 20539517261424164

  55. [55]

    Os Keyes, Jevan Hutson, and Meredith Durbin. 2019. A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry. InExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)(CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–11. doi:...

  56. [56]

    Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, and Scott A Hale. 2025. Why human–AI relationships need socioaffective alignment.Humanities and Social Sciences Communications12, 1 (2025), 1–9

  57. [57]

    Kupferschmidt, Kieran O’Doherty, and Joshua A

    Kristina L. Kupferschmidt, Kieran O’Doherty, and Joshua A. Skorburg. 2025. Write on Paper, Wrong in Practice: Why LLMs Still Struggle with Writing Clinical Notes.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1524–1534. doi:10.1609/aies.v8i2.36651

  58. [58]

    J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data.biometrics(1977), 159–174

  59. [59]

    Hao-Ping Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. InProceedings of the 2025 CHI conference on human factors in computing systems. 1–22

  60. [60]

    Hao-Ping Lee, Yu-Ju Yang, Thomas Serban Von Davier, Jodi Forlizzi, and Sauvik Das. 2024. Deepfakes, phrenology, surveillance, and more! a taxonomy of ai privacy risks. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19

  61. [61]

    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance.Human Factors46, 1 (2004), 50–80

  62. [62]

    Nancy G. Leveson. 2011.Engineering a safer world: systems thinking applied to safety. MIT press, Cambridge (Mass.)

  63. [63]

    Megan Li, Wendy Bickersteth, Ningjing Tang, Lorrie Cranor, Jason Hong, Hong Shen, and Hoda Heidari. 2025. A Closer Look at the Existing Risks of Generative AI: Mapping the Who, What, and How of Real-World Incidents.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1561–1573. doi:10.1609/aies.v8i2.36655

  64. [64]

    Isabella Loaiza and Roberto Rigobon. 2024. The EPOCH of AI: Human-Machine Complementarities at Work. doi:10.2139/ssrn.5028371

  65. [65]

    Ong, and Nick Haber

    Jonathan Lynn, Rachel Y. Kim, Sicun Gao, Daniel Schneider, Sachin S. Pandya, and Min Kyung Lee. 2025. Regulating Algorithmic Management: A Multi-Stakeholder Study of Challenges in Aligning Software and the Law for Workplace Scheduling. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, Athens Greece, 547–572. doi:...

  66. [66]

    Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach

    Michael A. Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. ...

  67. [67]

    21 NVIDIA Corporation

    Nestor Maslej, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Njenga Kariuki, Emily Capstick, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Toby Walsh, Armin Hamrah, Lapo Santarlasci, Julia Betts Lotufo, Alexandra Rome, Andrew Shi, and Sukrut O...

  68. [68]

    Sean McGregor. 2021. Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database.Proceedings of the AAAI Conference on Artificial Intelligence35, 17 (May 2021), 15458–15463. doi:10.1609/aaai.v35i17.17817 Number: 17

  69. [69]

    Kevin R McKee, Xuechunzi Bai, and Susan T Fiske. 2024. Warmth and competence in human-agent cooperation.Autonomous Agents and Multi-Agent Systems38, 1 (2024), 23

  70. [70]

    National Center for O*NET Development. 2026. O*NET Database. https://www.onetcenter.org/database.html. Accessed: 2026-03-24

  71. [71]

    Nataliya Nedzhvetskaya and JS Tan. 2024. No Simple Fix: How AI Harms Reflect Power and Jurisdiction in the Workplace. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). Association for Computing Machinery, New York, NY, USA, 422–432. doi:10.1145/3630106.3658915

  72. [72]

    2013.The design of everyday things: Revised and expanded edition

    Don Norman. 2013.The design of everyday things: Revised and expanded edition. Basic books

  73. [73]

    2025.Towards a common reporting framework for AI incidents

    OECD. 2025.Towards a common reporting framework for AI incidents. OECD Artificial Intelligence Papers. The Organisation for Economic Co-operation and Development (OECD). doi:10.1787/f326d4ac-en Edition: 34 Series: OECD Artificial Intelligence Papers

  74. [74]

    Kazuo Okamura and Seiji Yamada. 2020. Adaptive trust calibration for human-AI collaboration.PLOS ONE15, 2 (2020), e0229132

  75. [75]

    Lauren Olson, Ricarda Anna-Lena Fischer, Florian Kunneman, and Emitzá Guzmán. 2025. Who Speaks for Ethics? How Demograph- ics Shape Ethical Advocacy in Software Development. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 2847–2862

  76. [76]

    OpenAI. 2023. GPT-5: Large Language Model. https://openai.com/research/gpt-5. Accessed: 2026-03-24

  77. [77]

    1984.Normal Accidents: Living with High-Risk Technologies

    Charles Perrow. 1984.Normal Accidents: Living with High-Risk Technologies. Basic Books

  78. [78]

    Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. InProceedings of the 2020 conference on fairness, accountability, and transparency. 33–44

  79. [79]

    Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2021. Where responsible AI meets reality: Practitioner perspectives on enablers for shifting organizational practices.Proceedings of the ACM on Human-Computer Interaction5, CSCW1 (2021), 1–23

  80. [80]

    Jaspreet Ranjit, Ke Zhou, Swabha Swayamdipta, and Daniele Quercia. 2026. Are We Automating the Joy Out of Work? Designing AI to Augment Work, Not Meaning. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. 1–46

Showing first 80 references.