arxiv: 2604.07830 · v1 · submitted 2026-04-09 · 💻 cs.SE

Recognition: no theorem link

To Copilot and Beyond: 22 AI Systems Developers Want Built

Rudrajit Choudhuri , Christian Bird , Carmen Badea , Anita Sarma

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 💻 cs.SE

keywords developer workflowsAI tooling for software engineeringbounded delegationsurvey of developer needsthematic analysisconstraints on AI behaviorprofessional identity in coding

0 comments

The pith

Developers want AI to absorb assembly tasks around coding while keeping the core craft under their own control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys 860 Microsoft developers to map where they want AI support beyond the one-tenth of their day spent writing code. It surfaces 22 concrete AI systems across five task categories, each accompanied by hard constraints on behavior. Developers consistently draw a line at letting AI perform the surrounding assembly work—integration, quality checks, and coordination—while retaining authority over the creative and judgment-heavy parts of their job. This preference produces a pattern the authors label bounded delegation. The findings imply that future AI tooling will succeed by respecting explicit limits on scope, provenance, and uncertainty rather than by maximizing automation.

Core claim

Developers wanted AI to absorb the assembly work surrounding their craft, never the craft itself. That boundary tracks where they locate professional identity, suggesting that the value of AI tooling may lie as much in where and how precisely it stops as in what it does. The survey data reveal a right-shift burden: developers need quality signals moved earlier in the workflow to match accelerating code generation, while demanding authority scoping, provenance tracking, uncertainty signaling, and least-privilege access in every system.

What carries the argument

Bounded delegation: the explicit pattern in which developers delegate surrounding assembly tasks to AI but retain authority over the core coding craft and professional judgment.

If this is right

AI systems must embed quality signals earlier in the workflow to keep pace with faster code generation.
Every desired system requires explicit authority scoping, provenance tracking, uncertainty signaling, and least-privilege access.
The boundary of acceptable delegation is set by developers' sense of professional identity rather than by technical feasibility.
Tool value depends as much on precise stopping points as on the tasks the AI performs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounded-delegation logic may appear in other knowledge-work domains where identity is tied to judgment rather than output volume.
Interfaces for these 22 systems will need persistent visual or textual markers that make the delegation boundary immediately legible to the user.
As base models improve, the assembly-versus-craft distinction itself may shift, requiring periodic re-mapping of acceptable delegation limits.

Load-bearing premise

Self-reported desires from 860 Microsoft developers, processed through the described thematic analysis, accurately capture generalizable needs whose constraints will remain stable as AI capabilities advance.

What would settle it

A survey of developers outside Microsoft, or a repeat survey after a major jump in AI code-generation ability, that shows widespread willingness to let AI perform core creative coding tasks would falsify the bounded-delegation claim.

Figures

Figures reproduced from arXiv: 2604.07830 by Anita Sarma, Carmen Badea, Christian Bird, Rudrajit Choudhuri.

**Figure 1.** Figure 1: Analysis pipeline overview. Three frontier models independently discover themes from developers’ open-ended survey responses across two tracks. A reconciliation step produces a unified codebook per track. Researchers validate and refine themes before systematic coding proceeds. All three models then code each response against the approved codebook, providing a rationale before assigning codes. Inter-rater … view at source ↗

read the original abstract

Developers spend roughly one-tenth of their workday writing code, yet most AI tooling targets that fraction. This paper asks what should be built for the rest. We surveyed 860 Microsoft developers to understand where they want AI support, and where they want it to stay out. Using a human-in-the-loop, multi-model council-based thematic analysis, we identify 22 AI systems that developers want built across five task categories. For each, we describe the problem it solves, what makes it hard to build, and the constraints developers place on its behavior. Our findings point to a growing right-shift burden in AI-assisted development: developers wanted systems that embed quality signals earlier in their workflow to keep pace with accelerating code generation, while enforcing explicit authority scoping, provenance, uncertainty signaling, and least-privilege access throughout. This tension reveals a pattern we call "bounded delegation": developers wanted AI to absorb the assembly work surrounding their craft, never the craft itself. That boundary tracks where they locate professional identity, suggesting that the value of AI tooling may lie as much in where and how precisely it stops as in what it does.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Microsoft devs want AI for assembly work around their craft but not the craft itself, yielding a list of 22 tools and the bounded delegation framing.

read the letter

The paper's core contribution is a survey of 860 Microsoft developers that surfaces 22 specific AI systems they would like built, grouped into five task categories, plus the observation that developers draw a clear line around what they will delegate. They want help with surrounding assembly, quality checks, and coordination but insist on keeping authority, provenance, and uncertainty visible for the core work. That pattern gets labeled bounded delegation, and the paper ties it to where people locate their professional identity. The right-shift burden point about needing earlier quality signals also follows directly from the responses once code generation speeds up. The sample size is respectable for a developer survey, and the human-in-the-loop multi-model thematic analysis is a reasonable way to turn open text into a usable taxonomy. They also spell out the constraints developers attach to each proposed system, which is practical for anyone actually building tools. The main limitation is that every response comes from one company. Microsoft workflows, tooling, and culture could shape both the specific 22 items and the strength of the delegation boundary, so it is not yet clear how far the list or the pattern travels. Self-reported desires can also move as the underlying models improve, and the abstract leaves the exact question wording, response rate, and validation steps for the thematic coding implicit. Those details matter for judging whether the 22 systems are stable targets or artifacts of current limitations. This work is aimed at researchers and practitioners who design AI assistance for software engineering. Tool builders could use the constraints and the five categories as a checklist, and SE researchers studying adoption or human-AI collaboration would find the empirical grounding useful. It is worth sending to peer review so referees can examine the method details and test how much the findings depend on the single-company sample.

Referee Report

3 major / 1 minor

Summary. The paper reports a survey of 860 Microsoft developers and applies human-in-the-loop, multi-model council-based thematic analysis to identify 22 desired AI systems across five task categories. It describes the problems each system would solve, implementation challenges, and developer-imposed constraints (authority scoping, provenance, uncertainty signaling, least-privilege access), then interprets the results as evidence of a 'bounded delegation' pattern in which developers want AI to handle surrounding assembly work but not core professional craft.

Significance. If the taxonomy and bounded-delegation pattern hold beyond the sampled population, the work would be significant for AI-for-SE research by shifting attention from code-generation tools to broader workflow support, earlier quality-signal embedding, and explicit boundary mechanisms. The empirical grounding in developer self-reports and the explicit enumeration of constraints provide concrete design guidance that could influence both research prototypes and commercial tooling.

major comments (3)

[Abstract] Abstract and implied Methods section: the description of the survey and thematic analysis states the sample size (860) and the use of a human-in-the-loop multi-model council but supplies no information on question design, response rate, inter-rater reliability, or any validation steps against external populations; without these details the 22-system taxonomy and the bounded-delegation interpretation rest on unexamined methodological choices.
[Abstract] Abstract and Results: the central claim that the 22 systems and 'bounded delegation' pattern reflect general developer needs is undercut by the exclusive Microsoft sample; Microsoft-specific tooling, workflows, and culture may systematically shape the reported desires, so the taxonomy and the stability of the listed constraints (authority scoping, provenance, uncertainty signaling) cannot be assumed to transfer without additional evidence or explicit limitation statements.
[Discussion] Discussion of bounded delegation: the interpretation that developers locate professional identity at the boundary between assembly work and craft relies on cross-sectional self-reports; the manuscript provides no longitudinal data or robustness checks to support the claim that these boundaries will remain stable as AI capabilities advance.

minor comments (1)

[Abstract] Abstract: the five task categories are referenced but not enumerated; listing them would improve immediate readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and limitations of our work. We address each major point below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and implied Methods section: the description of the survey and thematic analysis states the sample size (860) and the use of a human-in-the-loop multi-model council but supplies no information on question design, response rate, inter-rater reliability, or any validation steps against external populations; without these details the 22-system taxonomy and the bounded-delegation interpretation rest on unexamined methodological choices.

Authors: We agree that greater methodological transparency is needed. The full manuscript contains a Methods section, but we will expand it in revision to detail the survey question design (including exact prompts and branching logic), the achieved response rate, the implementation of the multi-model council (including how disagreements were resolved), any quantitative inter-rater reliability metrics, and steps taken to validate themes against external developer populations or prior literature. These additions will allow readers to evaluate the taxonomy and interpretation more rigorously. revision: yes
Referee: [Abstract] Abstract and Results: the central claim that the 22 systems and 'bounded delegation' pattern reflect general developer needs is undercut by the exclusive Microsoft sample; Microsoft-specific tooling, workflows, and culture may systematically shape the reported desires, so the taxonomy and the stability of the listed constraints (authority scoping, provenance, uncertainty signaling) cannot be assumed to transfer without additional evidence or explicit limitation statements.

Authors: We accept this as a valid limitation. Although Microsoft employs developers across many product areas and geographies, the sample is organizationally bounded. In the revised manuscript we will add an explicit Limitations subsection and strengthen the Discussion to state that the taxonomy and constraint patterns are scoped to this population, note potential influences from internal tooling and culture, and refrain from claiming broad generalizability. We will retain the bounded-delegation framing as an observation within the sampled context rather than a universal claim. revision: partial
Referee: [Discussion] Discussion of bounded delegation: the interpretation that developers locate professional identity at the boundary between assembly work and craft relies on cross-sectional self-reports; the manuscript provides no longitudinal data or robustness checks to support the claim that these boundaries will remain stable as AI capabilities advance.

Authors: The bounded-delegation pattern is presented as an interpretive synthesis of the cross-sectional self-reports we collected, not as a longitudinal prediction. We will revise the Discussion to make this distinction explicit, acknowledge the absence of longitudinal or robustness data, and frame the finding as a snapshot of current developer preferences and identity boundaries. We will also add a forward-looking paragraph suggesting that future studies could track whether these boundaries shift with AI progress, while preserving the value of the present evidence for immediate design implications. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical survey and thematic analysis with no derivations or self-referential reductions

full rationale

The paper reports results from a survey of 860 Microsoft developers followed by human-in-the-loop multi-model thematic analysis to surface 22 desired AI systems and the 'bounded delegation' pattern. No equations, fitted parameters, or derivation chains exist that could reduce outputs to inputs by construction. All claims rest directly on the collected self-reported responses and the coding process applied to them; the named pattern is an interpretive label for observed themes rather than a renamed or fitted input. No load-bearing self-citations or uniqueness theorems are invoked. The work is therefore self-contained as a descriptive empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that developer self-reports and the thematic analysis process faithfully represent real needs without substantial bias or overgeneralization from the Microsoft sample.

axioms (2)

domain assumption Self-reported preferences collected via survey accurately reflect developers' true desires for AI system behavior and boundaries.
The entire set of 22 systems and the bounded delegation conclusion rests on treating survey answers as ground truth for what should be built.
domain assumption The human-in-the-loop multi-model council thematic analysis produces unbiased and complete categorization of responses into the 22 systems.
No independent validation of the analysis process is described in the abstract.

pith-pipeline@v0.9.0 · 5499 in / 1456 out tokens · 46972 ms · 2026-05-10T17:47:31.194171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 15 canonical work pages · 7 internal anchors

[1]

[n. d.]. Supplemental Package. https://cabird.github.io/22-systems-devs-want/
[2]

Sadia Afroz, Zixuan Feng, Katie Kimura, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma. 2025. Developer Productivity with GenAI.arXiv preprint arXiv:2510.24265(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Blake A Allan, Cassondra Batz-Barbarich, Haley M Sterling, and Louis Tay. 2019. Outcomes of meaningful work: A meta-analysis.Journal of management studies 56, 3 (2019), 500–528

2019
[4]

2022.The labor market impacts of technological change: From un- bridled enthusiasm to qualified optimism to vast uncertainty

David Autor. 2022.The labor market impacts of technological change: From un- bridled enthusiasm to qualified optimism to vast uncertainty. Technical Report. National Bureau of Economic Research

2022
[5]

Catherine Bailey, Ruth Yeoman, Adrian Madden, Marc Thompson, and Gary Kerridge. 2019. A review of the empirical literature on meaningful work: Progress and research agenda.Human Resource Development Review18, 1 (2019), 83–113

2019
[6]

An Endless Stream of AI Slop

Sebastian Baltes, Marc Cheong, and Christoph Treude. 2026. " An Endless Stream of AI Slop": The Growing Burden of AI-Assisted Software Development.arXiv preprint arXiv:2603.27249(2026)

work page arXiv 2026
[7]

Leonardo Banh, Florian Holldack, and Gero Strobel. 2025. Copiloting the future: How generative AI transforms Software Engineering.Information and Software Technology183 (2025), 107751

2025
[8]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue20, 6 (2022), 35–57

2022
[9]

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

Virginia Braun and Victoria Clark. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

2006
[11]

Virginia Braun and Victoria Clarke. 2022. Conceptual and design thinking for thematic analysis.Qualitative Psychology9, 1 (2022), 3

2022
[12]

Erik Brynjolfsson. 2022. The turing trap: The promise & peril of human-like artificial intelligence.Daedalus151, 2 (2022), 272–287

2022
[13]

Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. 2025. Dear Diary: A randomized controlled trial of Generative AI coding tools in the work- place. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 319–329

2025
[14]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

It would work for me too

Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, and Denae Ford. 2023. “It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools.ACM Transactions on Interactive Intelligent Systems(2023)

2023
[16]

Rudrajit Choudhuri, Carmen Badea, Christian Bird, Jenna Butler, Rob DeLine, and Brian Houck. 2025. AI Where It Matters: Where, Why, and How Developers Want AI Support in Daily Work.arXiv preprint arXiv:2510.00762(2025)

work page arXiv 2025
[17]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Guides Our Choices? Modeling Developers’ Trust and Behavioral Intentions Towards GenAI. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 1691–1703

2025
[18]

Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What Needs Attention? Prioritizing Drivers of Developers’ Trust and Adoption of Generative AI.arXiv preprint arXiv:2505.17418(2025)

work page arXiv 2025
[19]

2016.Qualitative inquiry and research design: Choosing among five approaches

John W Creswell and Cheryl N Poth. 2016.Qualitative inquiry and research design: Choosing among five approaches. Sage publications

2016
[20]

Zixuan Feng, Sadia Afroz, and Anita Sarma. 2025. From Gains to Strains: Modeling Developer Burnout with GenAI Adoption.arXiv preprint arXiv:2510.07435(2025)

work page arXiv 2025
[21]

Bent Flyvbjerg. 2006. Five misunderstandings about case-study research.Quali- tative inquiry12, 2 (2006), 219–245

2006
[22]

Natasa Gisev, J Simon Bell, and Timothy F Chen. 2013. Interrater agreement and interrater reliability: key concepts, approaches, and applications.Research in Social and Administrative Pharmacy9, 3 (2013), 330–338

2013
[23]

2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters

Kilem L Gwet. 2014.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC

2014
[24]

J Richard Hackman and Greg R Oldham. 1976. Motivation through the design of work: Test of a theory.Organizational behavior and human performance16, 2 (1976), 250–279

1976
[25]

Brittany Johnson, Christian Bird, Denae Ford, Ebtesam Al Haque, Nicole Forsgren, and Thomas Zimmermann. [n. d.]. Facilitating Trust in AI-assisted Software Tools.ACM Transactions on Software Engineering and Methodology([n. d.])
[26]

Brittany Johnson, Christian Bird, Denae Ford, Nicole Forsgren, and Thomas Zimmermann. 2023. Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 409–419

2023
[27]

Eirini Kalliamvakou. 2024. A developer’s second brain: Reducing complexity through partnership with AI

2024
[28]

Mansi Khemka and Brian Houck. 2024. Toward Effective AI Support for Devel- opers: A survey of desires and concerns.Commun. ACM67, 11 (2024), 42–49

2024
[29]

Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, B Ashok, and Chetan Bansal. 2025. Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven Era.arXiv preprint arXiv:2502.15287(2025)

work page arXiv 2025
[30]

2004.The social science encyclopedia

Adam Kuper. 2004.The social science encyclopedia. Routledge

2004
[31]

Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci, and Daniel Russo. 2025. Exploring Individual Factors in the Adoption of LLMs for Specific Software Engineering Tasks.arXiv preprint arXiv:2504.02553(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

1991.Emotion and adaptation

Richard S Lazarus. 1991.Emotion and adaptation. Oxford University Press

1991
[33]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michi- hiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al
[34]

Holistic evaluation of language models.arXiv preprint arXiv:2211.09110 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Yue Liu, Ratnadira Widyasari, Yanjie Zhao, Ivana Clairine Irsan, and David Lo
[36]

Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild.arXiv preprint arXiv:2603.28592(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

André N Meyer, Earl T Barr, Christian Bird, and Thomas Zimmermann. 2019. Today was a good day: The daily life of software developers.IEEE Transactions on Software Engineering47, 5 (2019), 863–880

2019
[38]

Maybe We Need Some More Examples:

Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L Butler. 2025. " Maybe We Need Some More Examples:" Individual and Team Drivers of Developer GenAI Tool Use.arXiv preprint arXiv:2507.21280(2025)

work page arXiv 2025
[39]

workslop

Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano, and Jeffrey T Hancock. 2025. AI-generated “workslop” is destroying productivity.Harvard Business Review(2025)

2025
[40]

bad days

Ike Obi, Jenna Butler, Sankeerti Haniyur, Brian Hassan, Margaret-Anne Storey, and Brendan Murphy. 2025. Identifying factors contributing to “bad days” for software developers: A mixed-methods study. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 1–11

2025
[41]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the keyboard? assessing the security of GitHub copilot’s code contributions. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 754–768

2022
[42]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2025. Asleep at the keyboard? assessing the security of github copilot’s code contributions.Commun. ACM68, 2 (2025), 96–105

2025
[43]

Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, André van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, and Diego Ramos
[44]

In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 330–341
[45]

Teade Punter, Marcus Ciolkowski, Bernd Freimut, and Isabel John. 2003. Con- ducting on-line surveys in software engineering. In2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings.IEEE, 80–88

2003
[46]

Ira J Roseman and Craig A Smith. 2001. Appraisal theory.Appraisal processes in emotion: Theory, methods, research(2001), 3–19

2001
[47]

Daniel Russo. 2024. Navigating the complexity of generative AI adoption in software engineering.ACM Transactions on Software Engineering and Methodology (2024)

2024
[48]

Hope Schroeder, Marianne Aubin Le Quéré, Casey Randazzo, David Mimno, and Sarita Schoenebeck. 2025. Large language models in qualitative research: uses, tensions, and intentions. InProceedings of the 2025 chi conference on human factors in computing systems. 1–17

2025
[49]

Ben Shneiderman. 2020. Human-centered artificial intelligence: Reliable, safe & trustworthy.International Journal of Human–Computer Interaction36, 6 (2020), 495–504

2020
[50]

Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Jacek Czerwonka, Brendan Murphy, and Eirini Kalliamvakou. 2019. Towards a theory of software Oregon State University & Microsoft Research, USA, 2026 Choudhuri et al. developer job satisfaction and perceived productivity.IEEE Transactions on Software Engineering47, 10 (2019), 2125–2142

2019
[51]

Robert H Tai, Lillian R Bentley, Xin Xia, Jason M Sitt, Sarah C Fankhauser, Ana M Chicas-Mosier, and Barnas G Monteith. 2024. An examination of the use of large language models to aid analysis of textual data.International Journal of Qualitative Methods23 (2024), 16094069241231168

2024
[52]

Eric Lansdown Trist and Kenneth W Bamforth. 1951. Some social and psycho- logical consequences of the longwall method of coal-getting: An examination of the psychological situation and defences of a work group in relation to the social structure and technological content of the work system.Human relations4, 1 (1951), 3–38

1951
[53]

Ruotong Wang, Ruijia Cheng, Denae Ford, and Thomas Zimmermann. 2023. Investigating and designing for trust in AI-powered code generation tools.arXiv preprint arXiv:2305.11248(2023)

work page arXiv 2023
[54]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

2022
[55]

Shuai Wu, Xue Li, Yanna Feng, Yufang Li, and Zhijun Wang. 2026. Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus.arXiv preprint arXiv:2604.02923(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[56]

Kazuma Yamasaki, Joseph Ayobami Joshua, Tasha Settewong, Mahmoud Alfadel, Kazumasa Shimari, and Kenichi Matsumoto. 2026. Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests.arXiv preprint arXiv:2601.20171 (2026)

work page arXiv 2026
[57]

Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2024. Measuring GitHub Copilot’s Impact on Productivity.Commun. ACM(2024)

2024