The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

Annie Vella; Kelly Blincoe

arxiv: 2605.23135 · v1 · pith:N4PT7R5Xnew · submitted 2026-05-22 · 💻 cs.SE

The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

Annie Vella , Kelly Blincoe This is my paper

Pith reviewed 2026-05-25 04:07 UTC · model grok-4.3

classification 💻 cs.SE

keywords AI coding assistantssoftware engineeringdeveloper experienceproductivitylongitudinal studysupervisory engineering workproductivity-experience paradox

0 comments

The pith

Longitudinal surveys reveal AI coding assistants shift engineers toward supervisory verification tasks while eroding aspects of developer experience.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks professional software engineers through questionnaires six months apart and finds consistent reports of reduced time on tasks like writing code. It identifies a shift from creation-focused work to verification and proposes supervisory engineering work as the new category of directing, evaluating, and correcting AI outputs. Productivity perceptions remain largely positive and stable, yet the matched group shows a near-doubling in reports of worsened developer experience, especially declining flow state and rising cognitive load. The findings indicate that AI assistants alter both the structure of software engineering work and how engineers experience performing it.

Core claim

Through a longitudinal mixed-methods investigation with 158 participants at the first time point and a matched cohort of 95 at the second, the study documents that 82 percent of engineers report spending less time writing code and a broader reallocation away from creation toward verification activities. It introduces supervisory engineering work as the emerging category encompassing direction, evaluation, and correction of AI-generated output. Productivity perceptions hold steady with 84 percent reporting improvement at both points, but among matched participants the share reporting worsened developer experience in at least one dimension rises from 14 percent to 27 percent, with flow state,

What carries the argument

the matched longitudinal cohort of 95 participants combined with the newly proposed category of supervisory engineering work

If this is right

Engineers spend measurably less time on code creation and more on verification of AI output.
A distinct category of supervisory engineering work emerges that requires new skills in evaluating and correcting AI suggestions.
Perceived productivity gains remain stable even as reports of diminished flow and higher cognitive load increase.
Feedback loops improve while other dimensions of developer experience decline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training programs may need to emphasize verification and oversight skills rather than pure coding fluency.
Team processes could be redesigned to protect flow state during AI-assisted work sessions.
Longer-term studies might track whether the observed experience erosion stabilizes or leads to higher turnover.

Load-bearing premise

Self-reported changes in time allocation, productivity, and developer experience accurately reflect the effects of AI assistants without meaningful distortion from participant self-selection or attrition.

What would settle it

Objective measurement of task durations or cognitive load in a controlled setting that shows no reduction in coding time or no erosion of flow state would contradict the self-reported patterns.

Figures

Figures reproduced from arXiv: 2605.23135 by Annie Vella, Kelly Blincoe.

**Figure 2.** Figure 2: Distribution of initial impressions of AI coding assistants at Q1. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of anticipated disappointment if AI coding assistants were removed at Q2. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Primary concern distribution at Q1 and Q2. “Other” excluded from analysis. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Task focus shifts across time points. at Q2), with 82% of participants reporting less time by Q2 and only 2% reporting more. Refactoring code and testing also showed means below neutral, with many reporting spending less time on these activities. In contrast, designing and debugging were near neutral at both time points. Reviewing code was the only task with means above the neutral midpoint at both time po… view at source ↗

**Figure 6.** Figure 6: Perceived developer experience impact across time points. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Developer experience perception transitions between Q1 and Q2. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Perceived productivity impact across time points. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Productivity perception transitions between Q1 and Q2. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

read the original abstract

AI coding assistants have become prolific in recent years. Through a longitudinal mixed-methods investigation, we examined how professional software engineers perceive the effects of AI coding assistants in regard to task focus, developer experience, and productivity. Two questionnaires were administered six months apart, yielding 158 eligible participants at the first time point, 101 at the second, and a matched longitudinal cohort of 95. Participants reported spending less time on most development tasks, with 82% reporting less on writing code. We find broader shift in focus from creation to verification activities. We propose a new category of work we term supervisory engineering work, encompassing the direction, evaluation, and correction of AI output. We also identified a productivity-experience paradox: productivity perceptions held stable, with 84% reporting improvement at both time points, yet among matched participants, the proportion reporting worsened developer experience in at least one dimension nearly doubled from 14% to 27%, with flow state and cognitive load eroding while feedback loops improved. These findings suggest that AI coding assistants are impacting both the nature of software engineering work and how engineers experience it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Longitudinal survey data on AI coding assistants shows stable productivity but rising developer experience complaints, yet the 40% attrition in the matched cohort leaves the 14-to-27% worsening open to bias.

read the letter

The paper's core finding is a within-subject shift over six months: productivity perceptions stay high at 84% reporting gains, but the share of matched participants noting worse developer experience in at least one area rises from 14% to 27%, with flow and cognitive load declining while feedback improves. They also document less time on code writing and more on verification, and they label the new oversight role as supervisory engineering work. That longitudinal framing and the paradox are the main additions beyond existing cross-sectional surveys. The repeated measures from working engineers give the numbers some grounding that single-time-point studies lack. The authors earn credit for running the second wave and keeping a matched subset of 95. The soft spot is exactly the one the stress-test flags. Dropping from 158 to 95 matched participants without baseline comparisons or weighting means selective attrition could drive the experience change even if nothing real happened in the population. Self-reported items on flow and load are especially easy to bias when people who are struggling are more likely to drop out. The abstract gives percentages and cohort sizes but no questionnaire validation, response rates by item, or controls for usage intensity, so it is hard to judge how robust the 82% less-time-on-coding figure really is. This work is for researchers who follow AI adoption in software teams and want early signals on experience side-effects. A reader who needs quick empirical trends on task reallocation will find usable numbers here, though anyone planning interventions would need stronger evidence on the experience side. It deserves a serious referee because the design is longitudinal and the topic is timely; the referee can ask for the missing attrition checks and survey details without starting from zero. I would send it for review with those specific requests rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a longitudinal mixed-methods survey of professional software engineers using AI coding assistants. Two questionnaires administered six months apart yielded 158 participants at T1 and 101 at T2, with a matched cohort of 95. Key claims include reduced time on coding tasks (82% report less time writing code), a shift toward verification activities, introduction of 'supervisory engineering work' as a new category, stable productivity perceptions (84% reporting improvement at both time points), and a 'productivity-experience paradox' in which the proportion of matched participants reporting worsened developer experience in at least one dimension rose from 14% to 27%, with declines in flow state and cognitive load but gains in feedback loops.

Significance. If the longitudinal within-subject changes are robust, the work provides rare empirical evidence on how AI assistants alter both task allocation and subjective developer experience over time. The mixed-methods design and matched cohort allow direct observation of change rather than cross-sectional snapshots, which is a strength for claims about evolving impacts. The proposed 'supervisory engineering work' category could usefully frame future research on verification and oversight activities.

major comments (2)

[Abstract / Results (matched cohort)] Abstract and Results section on the matched cohort: the central productivity-experience paradox rests on the within-subject increase from 14% to 27% reporting worsened developer experience. With 63 dropouts between the initial 158 and the matched 95, the manuscript does not report baseline (T1) comparisons of DX or productivity perceptions between retained and attrited participants, nor any inverse-probability weighting or sensitivity analysis. This omission leaves differential attrition as a plausible alternative explanation for the observed shift.
[Methods] Methods section: the abstract and results report percentages and cohort sizes but provide no details on questionnaire item wording, validation, statistical tests used for the 14%-to-27% change, or controls for AI usage intensity, experience level, or other covariates. Without these, it is not possible to assess whether the reported DX erosion is robust or sensitive to measurement choices.

minor comments (2)

[Abstract / Discussion] The definition and operationalization of 'supervisory engineering work' is introduced in the abstract but would benefit from an explicit coding scheme or example items in the methods or results to allow replication.
[Results] Table or figure presenting the matched-cohort DX changes should include exact item wording, response scales, and confidence intervals or p-values for the reported proportions.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their careful review and for identifying key areas where additional transparency is needed regarding the longitudinal cohort and methodological details. We address each major comment below.

read point-by-point responses

Referee: [Abstract / Results (matched cohort)] Abstract and Results section on the matched cohort: the central productivity-experience paradox rests on the within-subject increase from 14% to 27% reporting worsened developer experience. With 63 dropouts between the initial 158 and the matched 95, the manuscript does not report baseline (T1) comparisons of DX or productivity perceptions between retained and attrited participants, nor any inverse-probability weighting or sensitivity analysis. This omission leaves differential attrition as a plausible alternative explanation for the observed shift.

Authors: We acknowledge that differential attrition is a plausible alternative explanation and that the manuscript does not include baseline comparisons or sensitivity analyses. The survey was administered anonymously to protect participant privacy and encourage candid responses on sensitive topics such as developer experience; consequently, no identifying information exists that would permit direct comparison of T1 DX or productivity scores between the 95 matched participants and the 63 who did not complete T2. We will revise the manuscript to state this limitation explicitly, to present the matched-cohort results with appropriate caveats, and to note that inverse-probability weighting was not feasible given the data collected. No new empirical analysis can be added. revision: partial
Referee: [Methods] Methods section: the abstract and results report percentages and cohort sizes but provide no details on questionnaire item wording, validation, statistical tests used for the 14%-to-27% change, or controls for AI usage intensity, experience level, or other covariates. Without these, it is not possible to assess whether the reported DX erosion is robust or sensitive to measurement choices.

Authors: We agree that the Methods section requires expansion for reproducibility. In the revised manuscript we will add: (1) verbatim wording of the developer-experience and productivity items, (2) information on any pilot testing or validation steps performed, (3) the exact statistical procedure used to evaluate the paired change from 14% to 27% (McNemar’s test for paired proportions), and (4) exploratory analyses that control for self-reported AI usage intensity and years of professional experience. The primary analysis remained descriptive to preserve power in the modest matched sample; these additions will clarify the robustness of the findings. revision: yes

standing simulated objections not resolved

Baseline (T1) comparisons between retained and attrited participants cannot be performed because the survey was anonymous and no identifying or linking data were collected.

Circularity Check

0 steps flagged

No circularity: empirical survey with direct self-reports and no derivations

full rationale

This is a longitudinal mixed-methods survey study relying on participant questionnaires at two time points. Claims about task time shifts, supervisory engineering work, and the productivity-experience paradox are presented as direct summaries of self-reported data from the matched cohort of 95. No equations, first-principles derivations, fitted parameters, or mathematical predictions appear in the abstract or described structure. The productivity-experience paradox is an observed pattern in the responses, not a constructed equivalence. Attrition and self-selection are methodological limitations but do not constitute circularity in any derivation chain. The paper is self-contained against external benchmarks as an empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on survey methodology assumptions and introduce one new conceptual entity without external validation.

axioms (1)

domain assumption Participant self-reports on time allocation, productivity, and experience dimensions are reliable indicators of actual changes induced by AI tools.
The study depends entirely on questionnaire responses without objective performance metrics or observational data.

invented entities (1)

supervisory engineering work no independent evidence
purpose: Categorize the activities of directing, evaluating, and correcting AI-generated code.
Proposed as a new category based on observed shifts in task focus.

pith-pipeline@v0.9.0 · 5719 in / 1239 out tokens · 27685 ms · 2026-05-25T04:07:58.650152+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 3 internal anchors

[1]

Adam Alami and Neil Ernst. 2025. Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). 63–74. doi:10.1109/CHASE66643.2025.00016

work page doi:10.1109/chase66643.2025.00016 2025
[2]

Sebastian Baltes and Paul Ralph. 2022. Sampling in software engineering research: a critical review and guidelines.Empirical Software Engineering 27, 4 (April 2022). doi:10.1007/s10664-021-10072-8

work page doi:10.1007/s10664-021-10072-8 2022
[3]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools.Queue20, 6 (Dec. 2022), 35–57. doi:10.1145/3582083

work page doi:10.1145/3582083 2022
[4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative Research in Psychology3, 2 (Jan. 2006), 77–101. doi:10.1191/1478088706qp063oa

work page doi:10.1191/1478088706qp063oa 2006
[5]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health11, 4 (Aug. 2019), 589–597. doi:10.1080/2159676x.2019.1628806

work page doi:10.1080/2159676x.2019.1628806 2019
[6]

Brooks. 1987. No Silver Bullet Essence and Accidents of Software Engineering.Computer20, 4 (April 1987), 10–19. doi:10.1109/mc.1987.1663532

work page doi:10.1109/mc.1987.1663532 1987
[7]

Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. 2025. Dear Diary: A Randomized Controlled Trial of Generative AI Coding Tools in the Workplace. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 319–329. doi:10.1109/ICSE-SEIP66354.2025.00034

work page doi:10.1109/icse-seip66354.2025.00034 2025
[8]

Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot within Corporate Environment. doi:10.48550/arXiv.2402.05636

work page doi:10.48550/arxiv.2402.05636 2024
[9]

Tianyi Chen. 2024. The Impact of AI-Pair Programmers on Code Quality and Developer Satisfaction: Evidence from TiMi studio. In2024 International Conference on Generative Artificial Intelligence and Information Security (GAIIS). 201–205. doi:10.1145/3665348.3665383

work page doi:10.1145/3665348.3665383 2024
[10]

Creswell and Vicki L

John W. Creswell and Vicki L. Plano Clark. 2018.Designing and Conducting Mixed Methods Research(third ed.). SAGE, Thousand Oaks, California

work page 2018
[11]

Zheyuan (Kevin) Cui, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz. 2025. The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers. doi:10.2139/ssrn.4945566

work page doi:10.2139/ssrn.4945566 2025
[12]

Sarah D’Angelo, Ambar Murillo, Satish Chandra, and Andrew Macvean. 2024. What Do Developers Want From AI?IEEE Software41, 3 (May 2024), 11–15. doi:10.1109/MS.2024.3363538

work page doi:10.1109/ms.2024.3363538 2024
[13]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. InProceedings of the 54th ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, 1136–1142. doi:10.1145/3545945.3569823

work page doi:10.1145/3545945.3569823 2023
[14]

Kayla DePalma, Izabel Miminoshvili, Chiara Henselder, Kate Moss, and Eman Abdullah AlOmar. 2024. Exploring ChatGPT’s code refactoring capabilities: An empirical study.Expert Systems with Applications249 (Sept. 2024), 123602. doi:10.1016/j.eswa.2024.123602

work page doi:10.1016/j.eswa.2024.123602 2024
[15]

2025.State of AI-assisted Software Development

DORA. 2025.State of AI-assisted Software Development. Technical Report. Google. https://dora.dev/research/2025/dora-report/

work page 2025
[16]

Manuel Hoffmann, Sam Boysel, Frank Nagle, Sida Peng, and Kevin Xu. 2025. Generative AI and the Nature of Work. doi:10.2139/ssrn.5007084 Manuscript submitted to ACM The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study 27

work page doi:10.2139/ssrn.5007084 2025
[17]

Brian Houck, Travis Lowdermilk, Cody Beyer, Steven Clarke, and Ben Hanrahan. 2025. The SPACE of AI: Real-World Lessons on AI’s Impact on Developers. doi:10.48550/arXiv.2508.00178

work page doi:10.48550/arxiv.2508.00178 2025
[18]

Anna Y. Q. Huang, Cheng-Yan Lin, Sheng-Yi Su, and Stephen J. H. Yang. 2025. The impact of GenAI-enabled coding hints on students’ programming performance and cognitive load in an SRL-based Python course.British Journal of Educational Technology56, 5 (2025), 1942–1972. doi:10.1111/bjet. 13589

work page doi:10.1111/bjet 2025
[19]

Sarah Inman, Ambar Murillo, Sarah D’Angelo, Adam Brown, and Collin Green. 2025. Seamful AI for Creative Software Engineering.IEEE Software 42 (2025). doi:10.1109/MS.2025.3534085

work page doi:10.1109/ms.2025.3534085 2025
[20]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? doi:10.48550/arXiv.2310.06770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06770 2023
[21]

Nils Knoth, Antonia Tolzin, Andreas Janson, and Jan Marco Leimeister. 2024. AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence6 (2024), 100225. doi:10.1016/j.caeai.2024.100225

work page doi:10.1016/j.caeai.2024.100225 2024
[22]

Will I be replaced?

Mohammad Amin Kuhail, Sujith Samuel Mathew, Ashraf Khalil, Jose Berengueres, and Syed Jawad Hussain Shah. 2024. "Will I be replaced?" Assessing ChatGPT’s effect on software development and programmer perceptions of AI tools.Science of Computer Programming235 (July 2024), 103111. doi:10.1016/j.scico.2024.103111

work page doi:10.1016/j.scico.2024.103111 2024
[23]

Anand Kumar, Vishal Khare, Deepak Sharma, Satyam Kumar, Vijay Saini, Anshul Yadav, Sachendra Jain, Ankit Rana, Pratham Verma, Vaibhav Meena, and Avinash Edubilli. 2025. Intuition to Evidence: Measuring AI’s True Impact on Developer Productivity. doi:10.48550/arXiv.2509.19708

work page doi:10.48550/arxiv.2509.19708 2025
[24]

Eve Martina Lange, Åsa Cajander, and Maria Normark. 2025. Exploring Flow in IT Professionals’ Use of AI-Integrated Tools: Insights from Interviews. InArtificial Intelligence in HCI, Helmut Degen and Stavroula Ntoa (Eds.). Springer Nature Switzerland, Cham, 44–58. doi:10.1007/978-3-031-93429-2_3

work page doi:10.1007/978-3-031-93429-2_3 2025
[25]

Liang, Chenyang Yang, and Brad A

Jenny T. Liang, Chenyang Yang, and Brad A. Myers. 2024. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3597503.3608128

work page doi:10.1145/3597503.3608128 2024
[26]

Meyer, Laura E

André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, and Thomas Fritz. 2017. The Work Life of Developers: Activities, Switches, and Perceived Productivity.IEEE Transactions on Software Engineering43, 12 (2017), 1178–1193. doi:10.1109/TSE.2017.2656886

work page doi:10.1109/tse.2017.2656886 2017
[27]

Meyer, Thomas Fritz, Gail C

André N. Meyer, Thomas Fritz, Gail C. Murphy, and Thomas Zimmermann. 2014. Software developers’ perceptions of productivity. InProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA, 19–29. doi:10.1145/2635868.2635892

work page doi:10.1145/2635868.2635892 2014
[28]

Hal Mooz and Kevin Forsberg. 2006. 10.2.1 The Dual Vee - Illuminating the Management of Complexity.INCOSE International Symposium16, 1 (2006), 1368–1381. doi:10.1002/j.2334-5837.2006.tb02819.x

work page doi:10.1002/j.2334-5837.2006.tb02819.x 2006
[29]

Desmarais, and Zhen Ming (Jack) Jiang

Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability?Journal of Systems and Software203 (Sept. 2023), 111734. doi:10.1016/j.jss.2023.111734

work page doi:10.1016/j.jss.2023.111734 2023
[30]

Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2024. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3613904.3641936

work page doi:10.1145/3613904.3641936 2024
[31]

Emerson Murphy-Hill, Ciera Jaspan, Caitlin Sadowski, David Shepherd, Michael Phillips, Collin Winter, Andrea Knight, Edward Smith, and Matthew Jorde. 2021. What Predicts Software Developers’ Productivity?IEEE Transactions on Software Engineering47, 3 (March 2021), 582–594. doi:10.1109/TSE.2019.2900308

work page doi:10.1109/tse.2019.2900308 2021
[32]

Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot’s code suggestions. InProceedings of the 19th International Conference on Mining Software Repositories. ACM, Pittsburgh Pennsylvania, 1–5. doi:10.1145/3524842.3528470

work page doi:10.1145/3524842.3528470 2022
[33]

Abi Noda, Margaret-Anne Storey, Nicole Forsgren, and Michaela Greiler. 2023. DevEx: What Actually Drives Productivity: The developer-centric approach to measuring and improving productivity.Queue21, 2 (May 2023), Pages 20:35–Pages 20:53. doi:10.1145/3595878

work page doi:10.1145/3595878 2023
[34]

Ruchika Pandey, Prabhat Singh, Raymond Wei, and Shaila Shankar. 2024. Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects. arXiv:2406.17910 [cs.SE] https://arxiv.org/abs/2406.17910

work page arXiv 2024
[35]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE] https://arxiv.org/abs/2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. doi:10.1145/3576915.3623157

work page doi:10.1145/3576915.3623157 2023
[37]

Gustavo Pinto, Cleidson De Souza, Thayssa Rocha, Igor Steinmacher, Alberto Souza, and Edward Monteiro. 2024. Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes. InProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN ’24). Association for Computing...

work page doi:10.1145/3644815.3644949 2024
[38]

Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S

James Prather, Brent N. Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S. Randrianasolo, Brett A. Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers. InProceedings of the 2024 ACM Conference on International Computing Education Research, Vol. 1. ACM, 469–486. doi:10.1...

work page doi:10.1145/3632620.3671116 2024
[39]

Sanka Rasnayaka, Guanlin Wang, Ridwan Shariffdeen, and Ganesh Neelakanta Iyer. 2024. An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project. doi:10.48550/arXiv.2401.16186 Manuscript submitted to ACM 28 Vella and Blincoe

work page doi:10.48550/arxiv.2401.16186 2024
[40]

Abdul Razzaq, Jim Buckley, Qin Lai, Tingting Yu, and Goetz Botterweck. 2024. A Systematic Literature Review on the Influence of Enhanced Developer Experience on Developers’ Productivity: Factors, Practices, and Recommendations.ACM Comput. Surv.57, 1 (Oct. 2024), 13:1–13:46. doi:10.1145/3687299

work page doi:10.1145/3687299 2024
[41]

Agnia Sergeyuk, Yaroslav Golubev, Timofey Bryksin, and Iftekhar Ahmed. 2025. Using AI-based coding assistants in practice: State of affairs, perceptions, and ways forward.Information and Software Technology178 (Feb. 2025), 107610. doi:10.1016/j.infsof.2024.107610

work page doi:10.1016/j.infsof.2024.107610 2025
[42]

Stack Overflow. 2024. AI | 2024 Stack Overflow Developer Survey. https://survey.stackoverflow.co/2024/ai

work page 2024
[43]

Stack Overflow. 2025. 2025 Stack Overflow Developer Survey. https://survey.stackoverflow.co/2025/

work page 2025
[44]

Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Jacek Czerwonka, Brendan Murphy, and Eirini Kalliamvakou. 2021. Towards a Theory of Software Developer Job Satisfaction and Perceived Productivity.IEEE Transactions on Software Engineering47, 10 (Oct. 2021), 2125–2142. doi:10.1109/TSE.2019.2944354

work page doi:10.1109/tse.2019.2944354 2021
[45]

Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Tellefsen Wivestad, Astri Barbala, and Nils Brede Moe. 2025. Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study. doi:10.48550/arXiv.2509.20353

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.20353 2025
[46]

Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M

Thiago S. Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M. S. Neto, Cuiyun Gao, Jan Bosch, and Eduardo Santana de Almeida. 2024. Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey. http://arxiv.org/abs/2405.12195

work page arXiv 2024
[47]

Glassman

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InCHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–7. doi:10.1145/3491101.3519665

work page doi:10.1145/3491101.3519665 2022
[48]

The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

Annie Vella and Kelly Blincoe. 2026. Replication package for “The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study”. https://doi.org/10.5281/zenodo.18821767

work page doi:10.5281/zenodo.18821767 2026
[49]

Thomas Weber, Maximilian Brandmaier, Albrecht Schmidt, and Sven Mayer. 2024. Significant Productivity Gains through Programming with Large Language Models.Proc. ACM Hum.-Comput. Interact.8, EICS (June 2024), 256:1–256:29. doi:10.1145/3661145

work page doi:10.1145/3661145 2024
[50]

Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, and Shagun Bajpai

Justin D. Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, and Shagun Bajpai. 2024. Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise. doi:10.48550/arXiv.2412.06603

work page doi:10.48550/arxiv.2412.06603 2024
[51]

Medappa, Murat M

Feiyang Xu, Poonacha K. Medappa, Murat M. Tunc, Martijn Vroegindeweij, and Jan C. Fransoo. 2025. AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden. doi:10.48550/arXiv.2510.10165

work page doi:10.48550/arxiv.2510.10165 2025
[52]

Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv:2304.10778 [cs.SE] https://arxiv.org/abs/2304.10778

work page arXiv 2023
[53]

Han Zhang, Shiyi Wang, and Zijian Li. 2025. The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output.Brain Sciences15, 6 (May 2025), 565. doi:10.3390/brainsci15060565 Manuscript submitted to ACM

work page doi:10.3390/brainsci15060565 2025

[1] [1]

Adam Alami and Neil Ernst. 2025. Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). 63–74. doi:10.1109/CHASE66643.2025.00016

work page doi:10.1109/chase66643.2025.00016 2025

[2] [2]

Sebastian Baltes and Paul Ralph. 2022. Sampling in software engineering research: a critical review and guidelines.Empirical Software Engineering 27, 4 (April 2022). doi:10.1007/s10664-021-10072-8

work page doi:10.1007/s10664-021-10072-8 2022

[3] [3]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools.Queue20, 6 (Dec. 2022), 35–57. doi:10.1145/3582083

work page doi:10.1145/3582083 2022

[4] [4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative Research in Psychology3, 2 (Jan. 2006), 77–101. doi:10.1191/1478088706qp063oa

work page doi:10.1191/1478088706qp063oa 2006

[5] [5]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health11, 4 (Aug. 2019), 589–597. doi:10.1080/2159676x.2019.1628806

work page doi:10.1080/2159676x.2019.1628806 2019

[6] [6]

Brooks. 1987. No Silver Bullet Essence and Accidents of Software Engineering.Computer20, 4 (April 1987), 10–19. doi:10.1109/mc.1987.1663532

work page doi:10.1109/mc.1987.1663532 1987

[7] [7]

Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. 2025. Dear Diary: A Randomized Controlled Trial of Generative AI Coding Tools in the Workplace. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 319–329. doi:10.1109/ICSE-SEIP66354.2025.00034

work page doi:10.1109/icse-seip66354.2025.00034 2025

[8] [8]

Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot within Corporate Environment. doi:10.48550/arXiv.2402.05636

work page doi:10.48550/arxiv.2402.05636 2024

[9] [9]

Tianyi Chen. 2024. The Impact of AI-Pair Programmers on Code Quality and Developer Satisfaction: Evidence from TiMi studio. In2024 International Conference on Generative Artificial Intelligence and Information Security (GAIIS). 201–205. doi:10.1145/3665348.3665383

work page doi:10.1145/3665348.3665383 2024

[10] [10]

Creswell and Vicki L

John W. Creswell and Vicki L. Plano Clark. 2018.Designing and Conducting Mixed Methods Research(third ed.). SAGE, Thousand Oaks, California

work page 2018

[11] [11]

Zheyuan (Kevin) Cui, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz. 2025. The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers. doi:10.2139/ssrn.4945566

work page doi:10.2139/ssrn.4945566 2025

[12] [12]

Sarah D’Angelo, Ambar Murillo, Satish Chandra, and Andrew Macvean. 2024. What Do Developers Want From AI?IEEE Software41, 3 (May 2024), 11–15. doi:10.1109/MS.2024.3363538

work page doi:10.1109/ms.2024.3363538 2024

[13] [13]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. InProceedings of the 54th ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, 1136–1142. doi:10.1145/3545945.3569823

work page doi:10.1145/3545945.3569823 2023

[14] [14]

Kayla DePalma, Izabel Miminoshvili, Chiara Henselder, Kate Moss, and Eman Abdullah AlOmar. 2024. Exploring ChatGPT’s code refactoring capabilities: An empirical study.Expert Systems with Applications249 (Sept. 2024), 123602. doi:10.1016/j.eswa.2024.123602

work page doi:10.1016/j.eswa.2024.123602 2024

[15] [15]

2025.State of AI-assisted Software Development

DORA. 2025.State of AI-assisted Software Development. Technical Report. Google. https://dora.dev/research/2025/dora-report/

work page 2025

[16] [16]

Manuel Hoffmann, Sam Boysel, Frank Nagle, Sida Peng, and Kevin Xu. 2025. Generative AI and the Nature of Work. doi:10.2139/ssrn.5007084 Manuscript submitted to ACM The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study 27

work page doi:10.2139/ssrn.5007084 2025

[17] [17]

Brian Houck, Travis Lowdermilk, Cody Beyer, Steven Clarke, and Ben Hanrahan. 2025. The SPACE of AI: Real-World Lessons on AI’s Impact on Developers. doi:10.48550/arXiv.2508.00178

work page doi:10.48550/arxiv.2508.00178 2025

[18] [18]

Anna Y. Q. Huang, Cheng-Yan Lin, Sheng-Yi Su, and Stephen J. H. Yang. 2025. The impact of GenAI-enabled coding hints on students’ programming performance and cognitive load in an SRL-based Python course.British Journal of Educational Technology56, 5 (2025), 1942–1972. doi:10.1111/bjet. 13589

work page doi:10.1111/bjet 2025

[19] [19]

Sarah Inman, Ambar Murillo, Sarah D’Angelo, Adam Brown, and Collin Green. 2025. Seamful AI for Creative Software Engineering.IEEE Software 42 (2025). doi:10.1109/MS.2025.3534085

work page doi:10.1109/ms.2025.3534085 2025

[20] [20]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? doi:10.48550/arXiv.2310.06770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06770 2023

[21] [21]

Nils Knoth, Antonia Tolzin, Andreas Janson, and Jan Marco Leimeister. 2024. AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence6 (2024), 100225. doi:10.1016/j.caeai.2024.100225

work page doi:10.1016/j.caeai.2024.100225 2024

[22] [22]

Will I be replaced?

Mohammad Amin Kuhail, Sujith Samuel Mathew, Ashraf Khalil, Jose Berengueres, and Syed Jawad Hussain Shah. 2024. "Will I be replaced?" Assessing ChatGPT’s effect on software development and programmer perceptions of AI tools.Science of Computer Programming235 (July 2024), 103111. doi:10.1016/j.scico.2024.103111

work page doi:10.1016/j.scico.2024.103111 2024

[23] [23]

Anand Kumar, Vishal Khare, Deepak Sharma, Satyam Kumar, Vijay Saini, Anshul Yadav, Sachendra Jain, Ankit Rana, Pratham Verma, Vaibhav Meena, and Avinash Edubilli. 2025. Intuition to Evidence: Measuring AI’s True Impact on Developer Productivity. doi:10.48550/arXiv.2509.19708

work page doi:10.48550/arxiv.2509.19708 2025

[24] [24]

Eve Martina Lange, Åsa Cajander, and Maria Normark. 2025. Exploring Flow in IT Professionals’ Use of AI-Integrated Tools: Insights from Interviews. InArtificial Intelligence in HCI, Helmut Degen and Stavroula Ntoa (Eds.). Springer Nature Switzerland, Cham, 44–58. doi:10.1007/978-3-031-93429-2_3

work page doi:10.1007/978-3-031-93429-2_3 2025

[25] [25]

Liang, Chenyang Yang, and Brad A

Jenny T. Liang, Chenyang Yang, and Brad A. Myers. 2024. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3597503.3608128

work page doi:10.1145/3597503.3608128 2024

[26] [26]

Meyer, Laura E

André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, and Thomas Fritz. 2017. The Work Life of Developers: Activities, Switches, and Perceived Productivity.IEEE Transactions on Software Engineering43, 12 (2017), 1178–1193. doi:10.1109/TSE.2017.2656886

work page doi:10.1109/tse.2017.2656886 2017

[27] [27]

Meyer, Thomas Fritz, Gail C

André N. Meyer, Thomas Fritz, Gail C. Murphy, and Thomas Zimmermann. 2014. Software developers’ perceptions of productivity. InProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA, 19–29. doi:10.1145/2635868.2635892

work page doi:10.1145/2635868.2635892 2014

[28] [28]

Hal Mooz and Kevin Forsberg. 2006. 10.2.1 The Dual Vee - Illuminating the Management of Complexity.INCOSE International Symposium16, 1 (2006), 1368–1381. doi:10.1002/j.2334-5837.2006.tb02819.x

work page doi:10.1002/j.2334-5837.2006.tb02819.x 2006

[29] [29]

Desmarais, and Zhen Ming (Jack) Jiang

Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability?Journal of Systems and Software203 (Sept. 2023), 111734. doi:10.1016/j.jss.2023.111734

work page doi:10.1016/j.jss.2023.111734 2023

[30] [30]

Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2024. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3613904.3641936

work page doi:10.1145/3613904.3641936 2024

[31] [31]

Emerson Murphy-Hill, Ciera Jaspan, Caitlin Sadowski, David Shepherd, Michael Phillips, Collin Winter, Andrea Knight, Edward Smith, and Matthew Jorde. 2021. What Predicts Software Developers’ Productivity?IEEE Transactions on Software Engineering47, 3 (March 2021), 582–594. doi:10.1109/TSE.2019.2900308

work page doi:10.1109/tse.2019.2900308 2021

[32] [32]

Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot’s code suggestions. InProceedings of the 19th International Conference on Mining Software Repositories. ACM, Pittsburgh Pennsylvania, 1–5. doi:10.1145/3524842.3528470

work page doi:10.1145/3524842.3528470 2022

[33] [33]

Abi Noda, Margaret-Anne Storey, Nicole Forsgren, and Michaela Greiler. 2023. DevEx: What Actually Drives Productivity: The developer-centric approach to measuring and improving productivity.Queue21, 2 (May 2023), Pages 20:35–Pages 20:53. doi:10.1145/3595878

work page doi:10.1145/3595878 2023

[34] [34]

Ruchika Pandey, Prabhat Singh, Raymond Wei, and Shaila Shankar. 2024. Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects. arXiv:2406.17910 [cs.SE] https://arxiv.org/abs/2406.17910

work page arXiv 2024

[35] [35]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE] https://arxiv.org/abs/2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. doi:10.1145/3576915.3623157

work page doi:10.1145/3576915.3623157 2023

[37] [37]

Gustavo Pinto, Cleidson De Souza, Thayssa Rocha, Igor Steinmacher, Alberto Souza, and Edward Monteiro. 2024. Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes. InProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN ’24). Association for Computing...

work page doi:10.1145/3644815.3644949 2024

[38] [38]

Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S

James Prather, Brent N. Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S. Randrianasolo, Brett A. Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers. InProceedings of the 2024 ACM Conference on International Computing Education Research, Vol. 1. ACM, 469–486. doi:10.1...

work page doi:10.1145/3632620.3671116 2024

[39] [39]

Sanka Rasnayaka, Guanlin Wang, Ridwan Shariffdeen, and Ganesh Neelakanta Iyer. 2024. An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project. doi:10.48550/arXiv.2401.16186 Manuscript submitted to ACM 28 Vella and Blincoe

work page doi:10.48550/arxiv.2401.16186 2024

[40] [40]

Abdul Razzaq, Jim Buckley, Qin Lai, Tingting Yu, and Goetz Botterweck. 2024. A Systematic Literature Review on the Influence of Enhanced Developer Experience on Developers’ Productivity: Factors, Practices, and Recommendations.ACM Comput. Surv.57, 1 (Oct. 2024), 13:1–13:46. doi:10.1145/3687299

work page doi:10.1145/3687299 2024

[41] [41]

Agnia Sergeyuk, Yaroslav Golubev, Timofey Bryksin, and Iftekhar Ahmed. 2025. Using AI-based coding assistants in practice: State of affairs, perceptions, and ways forward.Information and Software Technology178 (Feb. 2025), 107610. doi:10.1016/j.infsof.2024.107610

work page doi:10.1016/j.infsof.2024.107610 2025

[42] [42]

Stack Overflow. 2024. AI | 2024 Stack Overflow Developer Survey. https://survey.stackoverflow.co/2024/ai

work page 2024

[43] [43]

Stack Overflow. 2025. 2025 Stack Overflow Developer Survey. https://survey.stackoverflow.co/2025/

work page 2025

[44] [44]

Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Jacek Czerwonka, Brendan Murphy, and Eirini Kalliamvakou. 2021. Towards a Theory of Software Developer Job Satisfaction and Perceived Productivity.IEEE Transactions on Software Engineering47, 10 (Oct. 2021), 2125–2142. doi:10.1109/TSE.2019.2944354

work page doi:10.1109/tse.2019.2944354 2021

[45] [45]

Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Tellefsen Wivestad, Astri Barbala, and Nils Brede Moe. 2025. Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study. doi:10.48550/arXiv.2509.20353

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.20353 2025

[46] [46]

Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M

Thiago S. Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M. S. Neto, Cuiyun Gao, Jan Bosch, and Eduardo Santana de Almeida. 2024. Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey. http://arxiv.org/abs/2405.12195

work page arXiv 2024

[47] [47]

Glassman

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InCHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–7. doi:10.1145/3491101.3519665

work page doi:10.1145/3491101.3519665 2022

[48] [48]

The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

Annie Vella and Kelly Blincoe. 2026. Replication package for “The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study”. https://doi.org/10.5281/zenodo.18821767

work page doi:10.5281/zenodo.18821767 2026

[49] [49]

Thomas Weber, Maximilian Brandmaier, Albrecht Schmidt, and Sven Mayer. 2024. Significant Productivity Gains through Programming with Large Language Models.Proc. ACM Hum.-Comput. Interact.8, EICS (June 2024), 256:1–256:29. doi:10.1145/3661145

work page doi:10.1145/3661145 2024

[50] [50]

Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, and Shagun Bajpai

Justin D. Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, and Shagun Bajpai. 2024. Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise. doi:10.48550/arXiv.2412.06603

work page doi:10.48550/arxiv.2412.06603 2024

[51] [51]

Medappa, Murat M

Feiyang Xu, Poonacha K. Medappa, Murat M. Tunc, Martijn Vroegindeweij, and Jan C. Fransoo. 2025. AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden. doi:10.48550/arXiv.2510.10165

work page doi:10.48550/arxiv.2510.10165 2025

[52] [52]

Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv:2304.10778 [cs.SE] https://arxiv.org/abs/2304.10778

work page arXiv 2023

[53] [53]

Han Zhang, Shiyi Wang, and Zijian Li. 2025. The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output.Brain Sciences15, 6 (May 2025), 565. doi:10.3390/brainsci15060565 Manuscript submitted to ACM

work page doi:10.3390/brainsci15060565 2025