You Shall Not Pass! Where and Why Developers Draw The Line on AI Autonomy

Anita Sarma; Carmen Badea; Christian Bird; Marco Gerosa; Rudrajit Choudhuri

arxiv: 2607.00533 · v1 · pith:4UZYOKXCnew · submitted 2026-07-01 · 💻 cs.HC · cs.SE

You Shall Not Pass! Where and Why Developers Draw The Line on AI Autonomy

Rudrajit Choudhuri , Christian Bird , Carmen Badea , Marco Gerosa , Anita Sarma This is my paper

Pith reviewed 2026-07-02 06:31 UTC · model grok-4.3

classification 💻 cs.HC cs.SE

keywords AI autonomysoftware developerstask characteristicscognitive appraisalwork designautonomy preferencesmixed-methods studyprofessional developers

0 comments

The pith

Developers accept AI producing work under oversight but grant less autonomy for identity-defining and design tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines where professional developers set boundaries on AI autonomy during software engineering work. A mixed-methods study of 448 Microsoft developers shows most accept AI generating outputs provided they retain oversight, but acceptance drops sharply for tasks tied to professional identity, human interaction, or creative design. Acceptance rises with greater AI experience and personal risk tolerance. Task accountability reduces willingness to let AI act independently on the developer's behalf, while task identity reduces willingness to delegate decisions, though high task demands increase such delegation.

Core claim

Most developers accepted AI producing work under their oversight, although accepted autonomy varied substantively across tasks and individuals. Acceptance was lowest for identity-defining, human-facing, and design-oriented work, and higher among developers with more AI experience and risk tolerance. Task accountability was associated with lower odds of allowing AI to act on developers' behalf, whereas task identity was associated with lower odds of granting AI decision-making autonomy. Task demands had the opposite effect, increasing willingness to delegate decision-making to AI.

What carries the argument

Associations between task characteristics (accountability, identity, demands) and accepted AI autonomy levels, measured via survey and interpreted through cognitive appraisal and work design theories.

If this is right

Developers with more AI experience show higher acceptance of autonomy across tasks.
Higher task demands increase willingness to delegate decision-making to AI.
Task accountability lowers the odds of permitting AI to act independently on a developer's behalf.
Preferences for AI autonomy track how developers cognitively experience their own work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI tool interfaces could adapt autonomy levels based on detected task identity or accountability.
The pattern of resistance in identity-defining work may appear in other professional domains such as law or medicine.
Individual risk tolerance could be assessed to personalize AI assistance levels within teams.
As AI capabilities grow, repeated surveys might reveal whether these boundaries shift over time.

Load-bearing premise

That self-reported preferences from this single-company sample of developers reflect stable attitudes toward AI autonomy that hold beyond this population and context.

What would settle it

A replication survey at multiple non-Microsoft companies showing no link between task identity and reduced acceptance of AI decision-making autonomy.

Figures

Figures reproduced from arXiv: 2607.00533 by Anita Sarma, Carmen Badea, Christian Bird, Marco Gerosa, Rudrajit Choudhuri.

**Figure 1.** Figure 1: Participant response distribution (𝑛=1,535 responses; 448 developers) across the five autonomy levels. Rows group responses by SDLC category, with the top row pooling all categories; columns order the levels L1 to L5, defined by who decides and who acts. Each row reports the share of responses at each level, and the proportion of autonomy participants kept versus ceded to AI, split at each category’s media… view at source ↗

read the original abstract

As AI takes on more software work, the line between human and AI effort is shifting. Where developers draw that line around AI autonomy bears on how we design tools and roles that preserve meaningful work. Drawing on cognitive appraisal theory, work design, and automation research, we conducted a mixed-methods study of 448 professional developers at Microsoft to investigate their accepted levels of AI autonomy across software engineering work. Most developers accepted AI producing work under their oversight, although accepted autonomy varied substantively across tasks and individuals. Acceptance was lowest for identity-defining, human-facing, and design-oriented work, and higher among developers with more AI experience and risk tolerance. Task accountability was associated with lower odds of allowing AI to act on developers' behalf, whereas task identity was associated with lower odds of granting AI decision-making autonomy. Task demands had the opposite effect, increasing willingness to delegate decision-making to AI. Our findings suggest that preferences for AI autonomy reflect how developers cognitively experience their work, highlighting important considerations for designing meaningful work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Survey of 448 Microsoft developers maps task identity and accountability to lower AI autonomy acceptance, but single-firm self-reports are the clear limit on broader use.

read the letter

The core result is that most of these developers accept AI producing work under oversight, yet acceptance drops for identity-defining, human-facing, or design tasks. Task accountability links to lower odds of letting AI act on their behalf, task identity to lower decision-making autonomy, and higher task demands to more willingness to delegate decisions. Experience with AI and risk tolerance raise acceptance overall.

The paper does a straightforward job collecting responses across a range of SE tasks and connecting them to cognitive appraisal and work design ideas. The mixed-methods piece and the breakdown by autonomy dimensions (oversight vs. decision-making vs. acting) give concrete patterns that prior automation work lacked in this domain.

The obvious weakness is the sample: all 448 respondents come from one company. That makes it hard to separate Microsoft-specific factors from general developer preferences. Self-reported acceptance is also one step removed from observed behavior, and the abstract gives no sampling details, controls, effect sizes, or model specs, so the strength of the logistic associations is difficult to judge from what is shown.

This is for researchers and tool builders working on AI in software engineering who need data on where developers actually draw boundaries. Anyone studying human-AI role shifts in industry settings will find the task-level findings usable. It deserves peer review because the data collection is sizable and the questions matter for current practice, even if methods transparency and external checks will be required in revision.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a mixed-methods survey of 448 professional developers at Microsoft on accepted levels of AI autonomy across software engineering tasks. Drawing on cognitive appraisal theory and work design research, it finds that most developers accept AI producing work under human oversight, with acceptance varying across tasks (lowest for identity-defining, human-facing, and design-oriented work) and individuals (higher with greater AI experience and risk tolerance). Logistic associations indicate that task accountability lowers odds of allowing AI to act on the developer's behalf, task identity lowers odds of granting decision-making autonomy, and task demands increase willingness to delegate decision-making. The findings are interpreted as reflecting how developers cognitively experience their work, with implications for tool and role design.

Significance. If the reported associations hold after full methodological scrutiny, the work supplies concrete empirical patterns on developer preferences for AI autonomy that can inform HCI and software engineering tool design aimed at preserving meaningful work. The mixed-methods design and explicit grounding in established theories (cognitive appraisal, work design) are strengths that allow the authors to move beyond purely descriptive results. The single-organization sample, however, constrains claims about broader developer populations.

major comments (2)

[Methods] Methods section: The manuscript provides no details on sampling frame, recruitment procedure, response rate, or demographic controls for the 448 Microsoft respondents. Without these, it is impossible to evaluate selection bias or to interpret the reported logistic associations (task accountability, task identity, task demands) as stable preferences rather than artifacts of the specific Microsoft context and tooling.
[Discussion] Discussion and Conclusion: The central interpretive claim that 'preferences for AI autonomy reflect how developers cognitively experience their work' generalizes from a single-company sample to 'developers' broadly. This step is load-bearing for the paper's contribution to tool design but is not supported by cross-organization replication or behavioral validation data.

minor comments (2)

[Abstract] Abstract: Effect sizes, model specifications, and confidence intervals for the logistic regressions are not reported, making it difficult to assess the substantive importance of the reported associations.
[Results] Results: Tables presenting the logistic models should include the full set of predictors, variance inflation factors, and goodness-of-fit statistics to allow readers to judge robustness.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Methods] Methods section: The manuscript provides no details on sampling frame, recruitment procedure, response rate, or demographic controls for the 448 Microsoft respondents. Without these, it is impossible to evaluate selection bias or to interpret the reported logistic associations (task accountability, task identity, task demands) as stable preferences rather than artifacts of the specific Microsoft context and tooling.

Authors: We agree that the current Methods section lacks sufficient detail on these aspects. In the revised manuscript we will expand the Methods section to include the sampling frame (Microsoft's internal developer population), recruitment procedure (company survey distribution), response rate, and available demographic characteristics. These additions will allow readers to better assess potential selection bias and contextualize the logistic associations within the Microsoft setting. revision: yes
Referee: [Discussion] Discussion and Conclusion: The central interpretive claim that 'preferences for AI autonomy reflect how developers cognitively experience their work' generalizes from a single-company sample to 'developers' broadly. This step is load-bearing for the paper's contribution to tool design but is not supported by cross-organization replication or behavioral validation data.

Authors: We acknowledge that the single-organization sample constrains broad generalization. The manuscript already identifies the sample as Microsoft developers; we will revise the Discussion and Conclusion to more explicitly caveat that the observed patterns reflect cognitive experiences in this specific context and to emphasize implications for tool design primarily within similar large technology organizations. The theoretical grounding in cognitive appraisal and work design theories supports the interpretation as a basis for future research, but we will avoid overgeneralizing to all developers. revision: partial

standing simulated objections not resolved

Absence of cross-organization replication data or behavioral validation measures beyond the survey responses

Circularity Check

0 steps flagged

No circularity: empirical survey findings rest on direct data analysis

full rationale

This is a mixed-methods empirical study reporting survey responses and logistic associations from 448 developers. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described approach. Claims about task accountability, identity, and autonomy preferences are presented as direct outputs of the data collection and analysis rather than reductions to prior inputs by construction. The paper is self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the applicability of cognitive appraisal theory and work design frameworks to AI delegation decisions plus the assumption that the Microsoft developer sample and survey instrument capture generalizable preferences; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Cognitive appraisal theory, work design, and automation research frameworks apply to developers' decisions about AI autonomy in software engineering tasks.
Invoked to interpret survey associations between task characteristics and autonomy acceptance.

pith-pipeline@v0.9.1-grok · 5714 in / 1427 out tokens · 28620 ms · 2026-07-02T06:31:02.220339+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 20 canonical work pages · 5 internal anchors

[1]

[n. d.]. Supplemental Package. https://zenodo.org/record/21060399

work page arXiv
[2]

Sadia Afroz, Zixuan Feng, Tyler Menezes, Katie Kimura, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma. 2026. The Fast and Spurious: Developer Productivity with GenAI. ACM International Conference on the Foundations of Software Engineering

2026
[3]

Akerlof and Rachel E

George A. Akerlof and Rachel E. Kranton. 2000. Economics and Identity.The Quarterly Journal of Economics115, 3 (2000), 715–753

2000
[4]

Moaath Alshaikh, Tasneem Alshaher, Ricardo Vieira, Beatriz Santana, Clelio Xavier, Jose Amancio, Glauco Carneiro, Julio Leite, Savio Freire, and Manoel Mendonca. 2026. Prompt Engineering Strategies for LLM-based Qualitative Cod- ing of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study. InProceedings of the 1st Internat...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Andrew Anderson, Jimena Noa Guevara, Fatima Moussaoui, Tianyi Li, Mihaela Vorvoreanu, and Margaret Burnett. 2022. Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles.ACM Transactions on Interactive Intelligent Systems(2022)

2022
[6]

Anthropic. 2025. Claude. https://claude.ai/

2025
[7]

Julian Ashwin, Aditya Chhabra, and Vijayendra Rao. 2026. Using large language models for qualitative analysis can introduce serious bias.Sociological Methods & Research55, 3 (2026), 795–839

2026
[8]

David H Autor. 2015. Why are there still so many jobs? The history and future of workplace automation.Journal of economic perspectives29, 3 (2015), 3–30

2015
[9]

Lisanne Bainbridge. 1983. Ironies of automation. InAnalysis, design and evaluation of man–machine systems. Elsevier, 129–135

1983
[10]

Arnold B Bakker and Evangelia Demerouti. 2007. The job demands-resources model: State of the art.Journal of managerial psychology22, 3 (2007), 309–328

2007
[11]

Victor R Basili, Forrest Shull, and Filippo Lanubile. 2002. Building knowledge through families of experiments.IEEE transactions on software engineering25, 4 (2002), 456–473

2002
[12]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue20, 6 (2022), 35–57

2022
[13]

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t touch my code! Examining the effects of You Shall Not Pass! Where and Why Developers Draw The Line on AI Autonomy Oregon State University, Northern Arizona University, & Microsoft Research, USA, 2026 ownership on software quality. InFSE. 4–14

2011
[14]

Barry Boehm, Victor R Basili, et al. 2005. Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R. Basili426, 37 (2005), 426–431

2005
[15]

Virginia Braun and Victoria Clark. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

2006
[16]

Virginia Braun and Victoria Clarke. 2022. Conceptual and design thinking for thematic analysis.Qualitative Psychology9, 1 (2022), 3

2022
[17]

1987.No silver bullet

Frederick Brooks and H Kugler. 1987.No silver bullet. April

1987
[18]

Adam Brown, Sarah D’Angelo, Ambar Murillo, Ciera Jaspan, and Collin Green
[19]

In Proceedings of the 1st ACM International Conference on AI-Powered Software

Identifying the factors that influence trust in AI code completion. In Proceedings of the 1st ACM International Conference on AI-Powered Software. 1–9
[20]

Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. 2025. Dear Diary: A randomized controlled trial of Generative AI coding tools in the work- place. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 319–329

2025
[21]

Tavis S Campbell, Jillian A Johnson, and Kristin A Zernicke. 2020. Cognitive appraisal. InEncyclopedia of behavioral medicine. Springer, 486–487

2020
[22]

Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, and Denae Ford. 2023. ‘It would work for me too’: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools.ACM Transactions on Interactive Intelligent Systems(2023)

2023
[23]

Rudrajit Choudhuri, Carmen Badea, Christian Bird, Jenna Butler, Robert DeLine, and Brian Houck. 2026. AI Where It Matters: Where, Why, and How Developers Want AI Support in Daily Work. InProceedings of the 48th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

2026
[24]

Rudrajit Choudhuri, Christian Bird, Carmen Badea, and Anita Sarma. 2026. To Copilot and Beyond: 22 AI Systems Developers Want Built.arXiv preprint arXiv:2604.07830(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Rudrajit Choudhuri, Christopher A Sanchez, Margaret Burnett, and Anita Sarma
[26]

Thinking Less, Trusting More: GenAI’s Impacts on Students’ Cognitive Habits.SSRN(2026)

2026
[27]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Guides Our Choices? Modeling Developers’ Trust and Behavioral Intentions Towards GenAI. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 1691–1703

2025
[28]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Needs Attention? Prioritizing Drivers of Developers’ Trust and Adoption of Generative AI.arXiv preprint arXiv:2505.17418(2025)

work page arXiv 2025
[29]

2023.ordinal—Regression Models for Ordinal Data

Rune Haubo Bojesen Christensen. 2023.ordinal—Regression Models for Ordinal Data. https://CRAN.R-project.org/package=ordinal R package version 2023.12-4

2023
[30]

2013.Statistical power analysis for the behavioral sciences

Jacob Cohen. 2013.Statistical power analysis for the behavioral sciences. Routledge

2013
[31]

Kevin Crowston and Francesco Bolici. 2025. Deskilling and upskilling with AI systems.Information Research an international electronic journal30, iConf (2025), 1009–1023

2025
[32]

Joost CF De Winter and Dimitra Dodou. 2014. Why the Fitts list has persisted throughout the history of function allocation.Cognition, Technology & Work16, 1 (2014), 1–11

2014
[33]

Zackary Okun Dunivin. 2024. Scalable Qualitative Coding with LLMs: Chain-of- Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks. arXiv preprint arXiv:2401.15170(2024)

work page arXiv 2024
[34]

Riedl, and Sara Alcorn

Upol Ehsan, Samir Passi, Koustuv Saha, Todd McNutt, Mark O. Riedl, and Sara Alcorn. 2026. From Future of Work to Future of Workers: Addressing Asymp- tomatic AI Harms to Foster Dignified human-AI Interaction. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26). ACM. doi:10.1145/3772318.3791081

work page doi:10.1145/3772318.3791081 2026
[35]

Endsley and Esin O

Mica R. Endsley and Esin O. Kiris. 1995. The Out-of-the-Loop Performance Problem and Level of Control in Automation.Human Factors37, 2 (1995), 381–

1995
[36]

doi:10.1518/001872095779064555

work page doi:10.1518/001872095779064555
[37]

Zixuan Feng, Sadia Afroz, and Anita Sarma. 2025. From Gains to Strains: Modeling Developer Burnout with GenAI Adoption.arXiv preprint arXiv:2510.07435(2025)

work page arXiv 2025
[38]

Zixuan Feng, Reed Milewicz, Emerson Murphy-Hill, Tyler Menezes, Alexander Serebrenik, Igor Steinmacher, and Anita Sarma. 2025. Charting Uncertain Waters: A Socio-Technical Roadmap for Sustaining Open Source Communities in the Age of GenAI.ACM Transactions on Software Engineering and Methodology(2025). doi:10.1145/3789210

work page doi:10.1145/3789210 2025
[39]

Paul M Fitts. 1951. Human engineering for an effective air-navigation and traffic- control system. (1951)

1951
[40]

Bent Flyvbjerg. 2006. Five misunderstandings about case-study research.Quali- tative inquiry12, 2 (2006), 219–245

2006
[41]

Yitzhak Fried and Gerald R Ferris. 1987. The validity of the job characteristics model: A review and meta-analysis.Personnel psychology40, 2 (1987), 287–322

1987
[42]

Frink and Richard J

Dwight D. Frink and Richard J. Klimoski. 1998. Toward a Theory of Accountability in Organizations and Human Resources Management. InResearch in Personnel and Human Resources Management, Gerald R. Ferris (Ed.). Vol. 16. JAI Press, Greenwich, CT, 1–51

1998
[43]

2007.Data analysis using regression and multilevel/hierarchical models

Andrew Gelman and Jennifer Hill. 2007.Data analysis using regression and multilevel/hierarchical models. Cambridge university press

2007
[44]

Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A Ernst, Alexander Serebrenik, and Andrzej Wąsowski. 2023. Autonomy is an acquired taste: Exploring developer preferences for GitHub bots. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1405–1417

2023
[45]

2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters

Kilem L Gwet. 2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC

2014
[46]

J Richard Hackman and Greg R Oldham. 1976. Motivation through the design of work: Test of a theory.Organizational behavior and human performance16, 2 (1976), 250–279

1976
[47]

Joseph F Hair. 2009. Multivariate data analysis. (2009)

2009
[48]

Angela T Hall, Dwight D Frink, and M Ronald Buckley. 2017. An accountability account: A review and synthesis of the theoretical and empirical research on felt accountability.Journal of Organizational Behavior38, 2 (2017), 204–224

2017
[49]

Donald Hedeker and Robert D Gibbons. 1994. A random-effects ordinal regression model for multilevel analysis.Biometrics(1994), 933–944

1994
[50]

Angjelin Hila and Elliott Hauser. 2025. Assessing the Reliability of Large Language Models for Deductive Qualitative Coding: A Comparative Study of ChatGPT Interventions.arXiv preprint arXiv:2507.14384(2025)

work page arXiv 2025
[51]

Stephen E Humphrey, Jennifer D Nahrgang, and Frederick P Morgeson. 2007. Integrating motivational, social, and contextual work design features: a meta- analytic summary and theoretical extension of the work design literature.Journal of applied psychology92, 5 (2007), 1332

2007
[52]

Brittany Johnson, Christian Bird, Denae Ford, Nicole Forsgren, and Thomas Zimmermann. 2023. Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 409–419

2023
[53]

William A Kahn. 1990. Psychological conditions of personal engagement and disengagement at work.Academy of management journal33, 4 (1990), 692–724

1990
[54]

Mansi Khemka and Brian Houck. 2024. Toward Effective AI Support for Devel- opers: A survey of desires and concerns.Commun. ACM67, 11 (2024), 42–49

2024
[55]

Barbara A Kitchenham and Shari L Pfleeger. 2008. Personal opinion surveys. In Guide to advanced empirical software engineering. Springer, 63–92

2008
[56]

Richard Koestner, Natasha Lekes, Theodore A Powers, and Emanuel Chicoine
[57]

Attaining personal goals: self-concordance plus implementation intentions equals success.Journal of personality and social psychology83, 1 (2002), 231

2002
[58]

Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, Balasubra- manyan Ashok, and Chetan Bansal. 2025. Time Warp: The Gap Between Devel- opers’ Ideal vs Actual Workweeks in an AI-Driven Era. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 12–22

2025
[59]

Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci, and Daniel Russo. 2025. Exploring Individual Factors in the Adoption of LLMs for Specific Software Engineering Tasks.arXiv preprint arXiv:2504.02553(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

1991.Emotion and adaptation

Richard S Lazarus. 1991.Emotion and adaptation. Oxford University Press

1991
[61]

John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

2004
[62]

LePine, Nathan P

Jeffery A. LePine, Nathan P. Podsakoff, and Marcie A. LePine. 2005. A Meta- Analytic Test of the Challenge Stressor–Hindrance Stressor Framework: An Explanation for Inconsistent Relationships Among Stressors and Performance. Academy of Management Journal48, 5 (2005), 764–775. doi:10.5465/amj.2005. 18803921

work page doi:10.5465/amj.2005 2005
[63]

Jennifer S Lerner and Philip E Tetlock. 1999. Accounting for the effects of accountability.Psychological bulletin125, 2 (1999), 255

1999
[64]

Zijian Li, Luzhen Tang, Mengyu Xia, Xinyu Li, Naping Chen, Dragan Gašević, and Yizhou Fan. 2026. When LLMs Fall Short in Deductive Coding: Model Comparisons and Human-AI Collaboration Workflow Design. InProceedings of the 16th International Conference on Learning Analytics and Knowledge (LAK ’26). arXiv:2512.21041

work page arXiv 2026
[65]

Hannah Limerick, David Coyle, and James W Moore. 2014. The experience of agency in human-computer interactions: a review.Frontiers in human neuro- science8 (2014), 643

2014
[66]

Marjolein Lips-Wiersma and Lani Morris. 2009. Discriminating between ‘mean- ingful work’and the ‘management of meaning’.Journal of business ethics88, 3 (2009), 491–511

2009
[67]

Brian Lubars and Chenhao Tan. 2019. Ask not what AI can do, but what AI should do: Towards a framework of task delegability.Advances in neural information processing systems32 (2019)

2019
[68]

Russell A Matthews, Laura Pineault, and Yeong-Hyun Hong. 2022. Normalizing the use of single-item measures: Validation of the single-item compendium for organizational psychology.Journal of Business and Psychology37, 4 (2022), 639– 673

2022
[69]

McKelvey and William Zavoina

Richard D. McKelvey and William Zavoina. 1975. A statistical model for the analysis of ordinal level dependent variables.Journal of Mathematical Sociology 4, 1 (1975), 103–120. doi:10.1080/0022250X.1975.9989847

work page doi:10.1080/0022250x.1975.9989847 1975
[70]

John P Meyer and Natalie J Allen. 1991. A three-component conceptualization of organizational commitment.Human resource management review1, 1 (1991), Oregon State University, Northern Arizona University, & Microsoft Research, USA, 2026 Choudhuri et al. 61–89

1991
[71]

Microsoft. 2024. Copilot. https://copilot.microsoft.com

2024
[72]

Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L Butler. 2025. ‘Maybe We Need Some More Examples:’ Individual and Team Drivers of Developer GenAI Tool Use.arXiv preprint arXiv:2507.21280(2025)

work page arXiv 2025
[73]

Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models.Methods in Ecology and Evolution4, 2 (2013), 133–142. doi:10.1111/j.2041-210x.2012.00261.x

work page doi:10.1111/j.2041-210x.2012.00261.x 2013
[74]

Raja Parasuraman and Dietrich H Manzey. 2010. Complacency and bias in human use of automation: An attentional integration.Human factors52, 3 (2010), 381–410

2010
[75]

Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. 2000. A model for types and levels of human interaction with automation.IEEE Transac- tions on systems, man, and cybernetics-Part A: Systems and Humans30, 3 (2000), 286–297

2000
[76]

Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, André van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, and Diego Ramos
[77]

In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 330–341
[78]

Qualtrics. 2026. Qualtrics Survey Platform. https://www.qualtrics.com. Accessed June 11, 2026

2026
[79]

Foyzur Rahman and Premkumar Devanbu. 2011. Ownership, experience and defects: a fine-grained study of authorship. InProceedings of the 33rd international conference on software engineering. 491–500

2011
[80]

Ira J Roseman and Craig A Smith. 2001. Appraisal theory.Appraisal processes in emotion: Theory, methods, research(2001), 3–19

2001

Showing first 80 references.

[1] [1]

[n. d.]. Supplemental Package. https://zenodo.org/record/21060399

work page arXiv

[2] [2]

Sadia Afroz, Zixuan Feng, Tyler Menezes, Katie Kimura, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma. 2026. The Fast and Spurious: Developer Productivity with GenAI. ACM International Conference on the Foundations of Software Engineering

2026

[3] [3]

Akerlof and Rachel E

George A. Akerlof and Rachel E. Kranton. 2000. Economics and Identity.The Quarterly Journal of Economics115, 3 (2000), 715–753

2000

[4] [4]

Moaath Alshaikh, Tasneem Alshaher, Ricardo Vieira, Beatriz Santana, Clelio Xavier, Jose Amancio, Glauco Carneiro, Julio Leite, Savio Freire, and Manoel Mendonca. 2026. Prompt Engineering Strategies for LLM-based Qualitative Cod- ing of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study. InProceedings of the 1st Internat...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Andrew Anderson, Jimena Noa Guevara, Fatima Moussaoui, Tianyi Li, Mihaela Vorvoreanu, and Margaret Burnett. 2022. Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles.ACM Transactions on Interactive Intelligent Systems(2022)

2022

[6] [6]

Anthropic. 2025. Claude. https://claude.ai/

2025

[7] [7]

Julian Ashwin, Aditya Chhabra, and Vijayendra Rao. 2026. Using large language models for qualitative analysis can introduce serious bias.Sociological Methods & Research55, 3 (2026), 795–839

2026

[8] [8]

David H Autor. 2015. Why are there still so many jobs? The history and future of workplace automation.Journal of economic perspectives29, 3 (2015), 3–30

2015

[9] [9]

Lisanne Bainbridge. 1983. Ironies of automation. InAnalysis, design and evaluation of man–machine systems. Elsevier, 129–135

1983

[10] [10]

Arnold B Bakker and Evangelia Demerouti. 2007. The job demands-resources model: State of the art.Journal of managerial psychology22, 3 (2007), 309–328

2007

[11] [11]

Victor R Basili, Forrest Shull, and Filippo Lanubile. 2002. Building knowledge through families of experiments.IEEE transactions on software engineering25, 4 (2002), 456–473

2002

[12] [12]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue20, 6 (2022), 35–57

2022

[13] [13]

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t touch my code! Examining the effects of You Shall Not Pass! Where and Why Developers Draw The Line on AI Autonomy Oregon State University, Northern Arizona University, & Microsoft Research, USA, 2026 ownership on software quality. InFSE. 4–14

2011

[14] [14]

Barry Boehm, Victor R Basili, et al. 2005. Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R. Basili426, 37 (2005), 426–431

2005

[15] [15]

Virginia Braun and Victoria Clark. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

2006

[16] [16]

Virginia Braun and Victoria Clarke. 2022. Conceptual and design thinking for thematic analysis.Qualitative Psychology9, 1 (2022), 3

2022

[17] [17]

1987.No silver bullet

Frederick Brooks and H Kugler. 1987.No silver bullet. April

1987

[18] [18]

Adam Brown, Sarah D’Angelo, Ambar Murillo, Ciera Jaspan, and Collin Green

[19] [19]

In Proceedings of the 1st ACM International Conference on AI-Powered Software

Identifying the factors that influence trust in AI code completion. In Proceedings of the 1st ACM International Conference on AI-Powered Software. 1–9

[20] [20]

Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. 2025. Dear Diary: A randomized controlled trial of Generative AI coding tools in the work- place. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 319–329

2025

[21] [21]

Tavis S Campbell, Jillian A Johnson, and Kristin A Zernicke. 2020. Cognitive appraisal. InEncyclopedia of behavioral medicine. Springer, 486–487

2020

[22] [22]

Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, and Denae Ford. 2023. ‘It would work for me too’: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools.ACM Transactions on Interactive Intelligent Systems(2023)

2023

[23] [23]

Rudrajit Choudhuri, Carmen Badea, Christian Bird, Jenna Butler, Robert DeLine, and Brian Houck. 2026. AI Where It Matters: Where, Why, and How Developers Want AI Support in Daily Work. InProceedings of the 48th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

2026

[24] [24]

Rudrajit Choudhuri, Christian Bird, Carmen Badea, and Anita Sarma. 2026. To Copilot and Beyond: 22 AI Systems Developers Want Built.arXiv preprint arXiv:2604.07830(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

Rudrajit Choudhuri, Christopher A Sanchez, Margaret Burnett, and Anita Sarma

[26] [26]

Thinking Less, Trusting More: GenAI’s Impacts on Students’ Cognitive Habits.SSRN(2026)

2026

[27] [27]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Guides Our Choices? Modeling Developers’ Trust and Behavioral Intentions Towards GenAI. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 1691–1703

2025

[28] [28]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Needs Attention? Prioritizing Drivers of Developers’ Trust and Adoption of Generative AI.arXiv preprint arXiv:2505.17418(2025)

work page arXiv 2025

[29] [29]

2023.ordinal—Regression Models for Ordinal Data

Rune Haubo Bojesen Christensen. 2023.ordinal—Regression Models for Ordinal Data. https://CRAN.R-project.org/package=ordinal R package version 2023.12-4

2023

[30] [30]

2013.Statistical power analysis for the behavioral sciences

Jacob Cohen. 2013.Statistical power analysis for the behavioral sciences. Routledge

2013

[31] [31]

Kevin Crowston and Francesco Bolici. 2025. Deskilling and upskilling with AI systems.Information Research an international electronic journal30, iConf (2025), 1009–1023

2025

[32] [32]

Joost CF De Winter and Dimitra Dodou. 2014. Why the Fitts list has persisted throughout the history of function allocation.Cognition, Technology & Work16, 1 (2014), 1–11

2014

[33] [33]

Zackary Okun Dunivin. 2024. Scalable Qualitative Coding with LLMs: Chain-of- Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks. arXiv preprint arXiv:2401.15170(2024)

work page arXiv 2024

[34] [34]

Riedl, and Sara Alcorn

Upol Ehsan, Samir Passi, Koustuv Saha, Todd McNutt, Mark O. Riedl, and Sara Alcorn. 2026. From Future of Work to Future of Workers: Addressing Asymp- tomatic AI Harms to Foster Dignified human-AI Interaction. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26). ACM. doi:10.1145/3772318.3791081

work page doi:10.1145/3772318.3791081 2026

[35] [35]

Endsley and Esin O

Mica R. Endsley and Esin O. Kiris. 1995. The Out-of-the-Loop Performance Problem and Level of Control in Automation.Human Factors37, 2 (1995), 381–

1995

[36] [36]

doi:10.1518/001872095779064555

work page doi:10.1518/001872095779064555

[37] [37]

Zixuan Feng, Sadia Afroz, and Anita Sarma. 2025. From Gains to Strains: Modeling Developer Burnout with GenAI Adoption.arXiv preprint arXiv:2510.07435(2025)

work page arXiv 2025

[38] [38]

Zixuan Feng, Reed Milewicz, Emerson Murphy-Hill, Tyler Menezes, Alexander Serebrenik, Igor Steinmacher, and Anita Sarma. 2025. Charting Uncertain Waters: A Socio-Technical Roadmap for Sustaining Open Source Communities in the Age of GenAI.ACM Transactions on Software Engineering and Methodology(2025). doi:10.1145/3789210

work page doi:10.1145/3789210 2025

[39] [39]

Paul M Fitts. 1951. Human engineering for an effective air-navigation and traffic- control system. (1951)

1951

[40] [40]

Bent Flyvbjerg. 2006. Five misunderstandings about case-study research.Quali- tative inquiry12, 2 (2006), 219–245

2006

[41] [41]

Yitzhak Fried and Gerald R Ferris. 1987. The validity of the job characteristics model: A review and meta-analysis.Personnel psychology40, 2 (1987), 287–322

1987

[42] [42]

Frink and Richard J

Dwight D. Frink and Richard J. Klimoski. 1998. Toward a Theory of Accountability in Organizations and Human Resources Management. InResearch in Personnel and Human Resources Management, Gerald R. Ferris (Ed.). Vol. 16. JAI Press, Greenwich, CT, 1–51

1998

[43] [43]

2007.Data analysis using regression and multilevel/hierarchical models

Andrew Gelman and Jennifer Hill. 2007.Data analysis using regression and multilevel/hierarchical models. Cambridge university press

2007

[44] [44]

Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A Ernst, Alexander Serebrenik, and Andrzej Wąsowski. 2023. Autonomy is an acquired taste: Exploring developer preferences for GitHub bots. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1405–1417

2023

[45] [45]

2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters

Kilem L Gwet. 2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC

2014

[46] [46]

J Richard Hackman and Greg R Oldham. 1976. Motivation through the design of work: Test of a theory.Organizational behavior and human performance16, 2 (1976), 250–279

1976

[47] [47]

Joseph F Hair. 2009. Multivariate data analysis. (2009)

2009

[48] [48]

Angela T Hall, Dwight D Frink, and M Ronald Buckley. 2017. An accountability account: A review and synthesis of the theoretical and empirical research on felt accountability.Journal of Organizational Behavior38, 2 (2017), 204–224

2017

[49] [49]

Donald Hedeker and Robert D Gibbons. 1994. A random-effects ordinal regression model for multilevel analysis.Biometrics(1994), 933–944

1994

[50] [50]

Angjelin Hila and Elliott Hauser. 2025. Assessing the Reliability of Large Language Models for Deductive Qualitative Coding: A Comparative Study of ChatGPT Interventions.arXiv preprint arXiv:2507.14384(2025)

work page arXiv 2025

[51] [51]

Stephen E Humphrey, Jennifer D Nahrgang, and Frederick P Morgeson. 2007. Integrating motivational, social, and contextual work design features: a meta- analytic summary and theoretical extension of the work design literature.Journal of applied psychology92, 5 (2007), 1332

2007

[52] [52]

Brittany Johnson, Christian Bird, Denae Ford, Nicole Forsgren, and Thomas Zimmermann. 2023. Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 409–419

2023

[53] [53]

William A Kahn. 1990. Psychological conditions of personal engagement and disengagement at work.Academy of management journal33, 4 (1990), 692–724

1990

[54] [54]

Mansi Khemka and Brian Houck. 2024. Toward Effective AI Support for Devel- opers: A survey of desires and concerns.Commun. ACM67, 11 (2024), 42–49

2024

[55] [55]

Barbara A Kitchenham and Shari L Pfleeger. 2008. Personal opinion surveys. In Guide to advanced empirical software engineering. Springer, 63–92

2008

[56] [56]

Richard Koestner, Natasha Lekes, Theodore A Powers, and Emanuel Chicoine

[57] [57]

Attaining personal goals: self-concordance plus implementation intentions equals success.Journal of personality and social psychology83, 1 (2002), 231

2002

[58] [58]

Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, Balasubra- manyan Ashok, and Chetan Bansal. 2025. Time Warp: The Gap Between Devel- opers’ Ideal vs Actual Workweeks in an AI-Driven Era. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 12–22

2025

[59] [59]

Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci, and Daniel Russo. 2025. Exploring Individual Factors in the Adoption of LLMs for Specific Software Engineering Tasks.arXiv preprint arXiv:2504.02553(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

1991.Emotion and adaptation

Richard S Lazarus. 1991.Emotion and adaptation. Oxford University Press

1991

[61] [61]

John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

2004

[62] [62]

LePine, Nathan P

Jeffery A. LePine, Nathan P. Podsakoff, and Marcie A. LePine. 2005. A Meta- Analytic Test of the Challenge Stressor–Hindrance Stressor Framework: An Explanation for Inconsistent Relationships Among Stressors and Performance. Academy of Management Journal48, 5 (2005), 764–775. doi:10.5465/amj.2005. 18803921

work page doi:10.5465/amj.2005 2005

[63] [63]

Jennifer S Lerner and Philip E Tetlock. 1999. Accounting for the effects of accountability.Psychological bulletin125, 2 (1999), 255

1999

[64] [64]

Zijian Li, Luzhen Tang, Mengyu Xia, Xinyu Li, Naping Chen, Dragan Gašević, and Yizhou Fan. 2026. When LLMs Fall Short in Deductive Coding: Model Comparisons and Human-AI Collaboration Workflow Design. InProceedings of the 16th International Conference on Learning Analytics and Knowledge (LAK ’26). arXiv:2512.21041

work page arXiv 2026

[65] [65]

Hannah Limerick, David Coyle, and James W Moore. 2014. The experience of agency in human-computer interactions: a review.Frontiers in human neuro- science8 (2014), 643

2014

[66] [66]

Marjolein Lips-Wiersma and Lani Morris. 2009. Discriminating between ‘mean- ingful work’and the ‘management of meaning’.Journal of business ethics88, 3 (2009), 491–511

2009

[67] [67]

Brian Lubars and Chenhao Tan. 2019. Ask not what AI can do, but what AI should do: Towards a framework of task delegability.Advances in neural information processing systems32 (2019)

2019

[68] [68]

Russell A Matthews, Laura Pineault, and Yeong-Hyun Hong. 2022. Normalizing the use of single-item measures: Validation of the single-item compendium for organizational psychology.Journal of Business and Psychology37, 4 (2022), 639– 673

2022

[69] [69]

McKelvey and William Zavoina

Richard D. McKelvey and William Zavoina. 1975. A statistical model for the analysis of ordinal level dependent variables.Journal of Mathematical Sociology 4, 1 (1975), 103–120. doi:10.1080/0022250X.1975.9989847

work page doi:10.1080/0022250x.1975.9989847 1975

[70] [70]

John P Meyer and Natalie J Allen. 1991. A three-component conceptualization of organizational commitment.Human resource management review1, 1 (1991), Oregon State University, Northern Arizona University, & Microsoft Research, USA, 2026 Choudhuri et al. 61–89

1991

[71] [71]

Microsoft. 2024. Copilot. https://copilot.microsoft.com

2024

[72] [72]

Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L Butler. 2025. ‘Maybe We Need Some More Examples:’ Individual and Team Drivers of Developer GenAI Tool Use.arXiv preprint arXiv:2507.21280(2025)

work page arXiv 2025

[73] [73]

Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models.Methods in Ecology and Evolution4, 2 (2013), 133–142. doi:10.1111/j.2041-210x.2012.00261.x

work page doi:10.1111/j.2041-210x.2012.00261.x 2013

[74] [74]

Raja Parasuraman and Dietrich H Manzey. 2010. Complacency and bias in human use of automation: An attentional integration.Human factors52, 3 (2010), 381–410

2010

[75] [75]

Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. 2000. A model for types and levels of human interaction with automation.IEEE Transac- tions on systems, man, and cybernetics-Part A: Systems and Humans30, 3 (2000), 286–297

2000

[76] [76]

Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, André van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, and Diego Ramos

[77] [77]

In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 330–341

[78] [78]

Qualtrics. 2026. Qualtrics Survey Platform. https://www.qualtrics.com. Accessed June 11, 2026

2026

[79] [79]

Foyzur Rahman and Premkumar Devanbu. 2011. Ownership, experience and defects: a fine-grained study of authorship. InProceedings of the 33rd international conference on software engineering. 491–500

2011

[80] [80]

Ira J Roseman and Craig A Smith. 2001. Appraisal theory.Appraisal processes in emotion: Theory, methods, research(2001), 3–19

2001