A survey of generative AI adoption and perceived productivity among scientists who program

Alexis Parker; Gabrielle O'Brien; Jeffrey Carver; Nasir Eisty

arxiv: 2512.19644 · v3 · submitted 2025-12-22 · 💻 cs.SE · cs.HC

A survey of generative AI adoption and perceived productivity among scientists who program

Gabrielle O'Brien , Alexis Parker , Nasir Eisty , Jeffrey Carver This is my paper

Pith reviewed 2026-05-16 20:17 UTC · model grok-4.3

classification 💻 cs.SE cs.HC

keywords generative AIscientific programmingperceived productivitycode acceptancesurveydevelopment practicesprogrammer experienceChatGPT

0 comments

The pith

The volume of AI-generated code scientists accept at once is the strongest predictor of their perceived productivity gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A survey of 868 scientists who program shows that generative AI adoption is highest among students and less experienced users, who favor general conversational tools like ChatGPT over specialized developer options. Perceived productivity rises with the number of lines of generated code typically accepted in one interaction, and this association is stronger among those with limited programming experience or infrequent use of practices such as testing, code review, and version control. The patterns suggest users gauge tool value mainly by output volume rather than by validation or integration quality. These results matter because programming underpins modern scientific work, so how researchers adopt and evaluate AI assistance can shape both the speed and reliability of research outputs.

Core claim

Through a survey of 868 scientific programmers, adoption of generative AI for coding is highest among students and less experienced programmers, with strong preference for conversational interfaces over developer-specific tools. Both inexperience and limited use of formal development practices are associated with greater perceived productivity, though these factors interact. The strongest predictor of perceived productivity is the number of lines of generated code typically accepted at once, indicating that scientific programmers may assess tool value primarily through generation volume rather than through subsequent validation or integration efforts.

What carries the argument

The association between perceived productivity and the typical number of lines of AI-generated code accepted per interaction, with interactions between programmer experience and use of development practices.

If this is right

Inexperienced programmers report the largest perceived gains, suggesting AI tools may lower entry barriers into scientific coding.
The interaction between experience and practices implies that adopting testing or version control can moderate perceived productivity differences.
Emphasis on code generation volume over validation may increase later debugging costs in research projects.
Field variation in adoption indicates that recommendations for AI tools should account for domain-specific workflows.
Preference for general-purpose interfaces over specialized ones highlights usability as a primary driver of tool choice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If high acceptance volumes correlate with lower code quality, scientific projects risk accumulating technical debt from unverified AI contributions.
Tool interfaces could be redesigned to encourage review of smaller segments, potentially aligning perceived and actual productivity more closely.
Educational programs for scientific computing might add explicit training on validating AI-generated code to offset the observed associations with inexperience.
Longitudinal tracking of research outputs would test whether volume-based acceptance leads to faster discovery or hidden delays in verification.

Load-bearing premise

Self-reported perceived productivity accurately reflects actual productivity gains and is not driven by unmeasured confounders such as field-specific norms or individual motivation.

What would settle it

A controlled study that objectively tracks code correctness, debugging time, and overall project completion rates for AI-assisted versus non-assisted scientific tasks, then compares those measures directly to participants' self-reported productivity scores.

read the original abstract

Programming is essential to modern scientific research, yet most scientists report inadequate training for the software development their work demands. Generative AI tools capable of code generation may support scientific programmers, but user studies indicate risks of over-reliance, particularly among inexperienced users. We surveyed 868 scientists who program, examining adoption patterns, tool preferences, and factors associated with perceived productivity. Adoption is highest among students and less experienced programmers, with variation across fields. Scientific programmers overwhelmingly prefer general-purpose conversational interfaces like ChatGPT over developer-specific tools. Both inexperience and limited use of development practices (like testing, code review, and version control) are associated with greater perceived productivity -- but these factors interact, suggesting formal practices may partially compensate for inexperience. The strongest predictor of perceived productivity is the number of lines of generated code typically accepted at once. These findings suggest scientific programmers using generative AI may gauge productivity by code generation rather than validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A decent-sized survey with some usable numbers on AI tool adoption among scientific programmers, but the productivity findings rest on thin subjective measures without validation or controls.

read the letter

This paper reports original survey responses from 868 scientists who program, covering adoption rates by experience level, tool preferences, and factors tied to perceived productivity gains from generative AI. The main patterns are higher uptake among students and novices, a clear preference for general conversational tools like ChatGPT over specialized developer ones, and an association where accepting more generated lines at once predicts higher perceived productivity. It also flags an interaction where limited use of standard practices like testing and version control links to greater perceived gains, especially for less experienced users. Those descriptive adoption and preference results are the parts that feel most straightforward and potentially reusable as baseline data. The sample size is reasonable for this kind of work, and the findings on experience-level differences and tool choices give a concrete picture of current behavior in research settings. The productivity side is weaker. Everything traces back to a single self-reported item with no external anchors such as commit logs, bug rates, or time metrics, and the regressions appear to lack detailed confounder controls for motivation or field norms. The positive link with inexperience could reflect over-optimism or reporting bias rather than real differences, and the claim that lines accepted is the strongest predictor sits on shaky ground without sensitivity checks or triangulation. This is mainly for people working on research software engineering, AI coding tools, or training programs who want empirical usage patterns rather than causal claims. A reader focused on survey design would want more on sampling and response rates, but the raw numbers on preferences are worth having. Send it for peer review with requests to expand the methods and limitations sections; the descriptive core is solid enough to justify referee time even if the interpretive parts need tightening.

Referee Report

3 major / 2 minor

Summary. The manuscript reports findings from an online survey of 868 scientists who engage in programming as part of their research. It describes adoption rates of generative AI code generation tools, noting higher adoption among students and less experienced programmers, and a strong preference for general-purpose conversational interfaces such as ChatGPT. The analysis identifies associations between perceived productivity gains and factors including inexperience, limited adherence to software development practices (e.g., testing, code review, version control), and the typical number of lines of AI-generated code accepted per interaction. The strongest predictor is reported as the volume of code accepted at once, with interactions suggesting that formal practices may mitigate risks associated with inexperience. The authors conclude that scientific programmers may be assessing productivity primarily through code generation volume rather than through validation or integration processes.

Significance. If the reported associations are robust, the survey provides timely empirical data on how generative AI is being integrated into scientific workflows, highlighting potential disparities in adoption and perceived benefits across experience levels. This could inform training programs and tool design for scientific computing. However, the reliance on unvalidated self-reported measures means the significance is primarily descriptive of perceptions rather than causal impacts on actual productivity.

major comments (3)

Abstract and Results: The identification of the number of lines of generated code accepted at once as the strongest predictor of perceived productivity rests on regression models using a single unvalidated self-reported Likert item as the outcome; no correlation with external proxies (commit velocity, bug rates, or project completion times) is reported, which is load-bearing for the central claim that this metric reflects genuine productivity differences rather than reporting bias.
Methods: The cross-sectional survey design reports associations between inexperience, limited development practices, and higher perceived productivity without documented controls for confounders such as motivation, field norms, or self-selection into the sample; this omission directly affects the interpretability of the interaction terms highlighted in the abstract.
Results: The claim that formal development practices partially compensate for inexperience is derived from interaction coefficients on subjective responses, yet the manuscript provides no sensitivity analyses or robustness checks against overestimation by inexperienced respondents, undermining the load-bearing interpretation offered in the discussion.

minor comments (2)

Abstract: Include the exact sample size (868) and any response rate information to give readers immediate context on survey scale and potential non-response bias.
Tables/Figures: Ensure all regression output tables report exact coefficients, standard errors, confidence intervals, and p-values for the key predictors and interaction terms rather than summary statements alone.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments on our survey of generative AI adoption among scientific programmers. The feedback underscores key limitations of self-reported, cross-sectional data, which we address point by point below. We propose targeted revisions to enhance transparency while noting constraints imposed by the original survey design.

read point-by-point responses

Referee: Abstract and Results: The identification of the number of lines of generated code accepted at once as the strongest predictor of perceived productivity rests on regression models using a single unvalidated self-reported Likert item as the outcome; no correlation with external proxies (commit velocity, bug rates, or project completion times) is reported, which is load-bearing for the central claim that this metric reflects genuine productivity differences rather than reporting bias.

Authors: We agree that the productivity outcome relies on a single unvalidated self-reported Likert item and that no external proxies were collected. The survey focused on perceptions and did not include objective measures such as commit velocity or bug rates. We will revise the abstract, results, and discussion to explicitly frame all findings as relating to perceived productivity, discuss potential reporting biases, and add a dedicated limitations subsection on this issue. revision: partial
Referee: Methods: The cross-sectional survey design reports associations between inexperience, limited development practices, and higher perceived productivity without documented controls for confounders such as motivation, field norms, or self-selection into the sample; this omission directly affects the interpretability of the interaction terms highlighted in the abstract.

Authors: The cross-sectional design inherently limits causal claims and full confounder control. We will expand the methods section to detail the demographic and field controls that were included in the regressions and revise the discussion to explicitly note the absence of controls for motivation or self-selection as a limitation on interpreting the interaction terms. revision: partial
Referee: Results: The claim that formal development practices partially compensate for inexperience is derived from interaction coefficients on subjective responses, yet the manuscript provides no sensitivity analyses or robustness checks against overestimation by inexperienced respondents, undermining the load-bearing interpretation offered in the discussion.

Authors: We will add sensitivity analyses and robustness checks to the results section, including stratification by experience level and alternative model specifications to evaluate potential overestimation. These additions will be reported to support the interpretation of the interaction effects. revision: yes

standing simulated objections not resolved

We cannot add correlations with external productivity proxies (e.g., commit velocity or bug rates) because the survey did not collect such objective data.

Circularity Check

0 steps flagged

No circularity: observational survey reports direct associations without derivations or self-referential reductions

full rationale

The paper is a cross-sectional survey of 868 respondents analyzing adoption patterns and associations with a single self-reported perceived productivity item via regression. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes appear in the derivation chain. All reported predictors (e.g., lines of code accepted, inexperience, development practices) and their interactions are computed directly from the survey responses; the analysis does not reduce any claimed result to its own inputs by construction. This is the standard non-circular outcome for descriptive survey work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Claims rest on the assumption that self-reported survey responses from 868 participants validly capture behaviors and perceptions, with no free parameters or invented entities introduced.

axioms (1)

domain assumption Self-reported survey responses accurately reflect actual tool usage, experience levels, and perceived productivity
Standard assumption in survey research but unverified by objective measures in the reported findings.

pith-pipeline@v0.9.0 · 5457 in / 1209 out tokens · 19422 ms · 2026-05-16T20:17:15.092529+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)

Hettrick, S. It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)

work page 2014
[2]

Software Carpentry: Lessons learned (2016)

Wilson, G. Software Carpentry: Lessons learned (2016)

work page 2016
[3]

C., Weber, N., Ram, K., Gesing, S

Carver, J. C., Weber, N., Ram, K., Gesing, S. & Katz, D. S. A survey of the state of the practice for research software in the United States.PeerJ Computer Science8(2022)

work page 2022
[4]

& Katz, D

Nangia, U. & Katz, D. S. Surveying the US National Postdoctoral Association regarding software use and training in research (2017)

work page 2017
[5]

URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864

Ziegler, A.et al.Productivity assessment of neural code completion (2022). URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864

work page doi:10.1145/3520312.3534864 2022
[6]

URL https://arxiv.org/abs/2509.19708v1

Kumar, A.et al.Intuition to Evidence: Measuring AI’s True Impact on Developer Productivity (2025). URL https://arxiv.org/abs/2509.19708v1

work page arXiv 2025
[7]

Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S

Prather, J.et al.The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers.Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1469–486 (2024). URL https://dl.acm. org/doi/10.1145/3632620.3671116

work page doi:10.1145/3632620.3671116 2024
[8]

Moradi Dakhel, A.et al.GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software203, 111734 (2023)

work page 2023
[9]

& Horvitz, E

Mozannar, H., Fourney, A., Bansal, G. & Horvitz, E. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming.Conference on Human Factors in Computing Systems - Proceedings(2024)

work page 2024
[10]

& Glassman, E

Vaithilingam, P., Zhang, T. & Glassman, E. L. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models.Conference on Human Factors in Computing Systems - Proceedings (2022)

work page 2022
[11]

Barke, S., James, M. B. & Polikarpova, N. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7(2023)

work page 2023
[12]

How Scientists Use Large Language Models to Program.Conference on Human Factors in Computing Systems - Proceedings16 (2025)

O’Brien, G. How Scientists Use Large Language Models to Program.Conference on Human Factors in Computing Systems - Proceedings16 (2025). URL https: //dl.acm.org/doi/10.1145/3706598.3713668

work page doi:10.1145/3706598.3713668 2025
[13]

Threats to scientific software from over-reliance on AI code assistants

O’Brien, G. Threats to scientific software from over-reliance on AI code assistants. Nature Computational Science 2025 5:95, 701–703 (2025). URL https://www. nature.com/articles/s43588-025-00845-2. 36

work page 2025
[14]

PLOS Computational Biology 13(6), e1005510 (Jun 2017)

Wilson, G.et al.Good enough practices in scientific computing (2017). URL https://doi.org/10.1371/journal.pcbi.1005510

work page doi:10.1371/journal.pcbi.1005510 2017
[15]

& Sankaranarayana, R

Nguyen-Hoan, L., Flint, S. & Sankaranarayana, R. A Survey of Scientific Software Development. Tech. Rep. (2010). URL http://apollo.anu.edu.au

work page 2010
[16]

URL https://dl.acm.org/doi/10.1145/2063348

Prabhu, P.et al.A survey of the practice of computational science.State of the Practice Reports, SC’11(2011). URL https://dl.acm.org/doi/10.1145/2063348. 2063374

work page doi:10.1145/2063348 2011
[17]

URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875

Retraction: Inferring the effects of cancer treatment: Divergent results from Early Breast Cancer Trialists’ Collaborative Group meta-analyses of randomized trials and observational data from SEER registries (Journal of Clinical Oncology (34) 803-809 (2016)).Journal of Clinical Oncology34, 3358–3359 (2016). URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875

work page doi:10.1200/jco.2016.69.0875 2016
[18]

& Latham, K

Karraker, A. & Latham, K. Authors’ explanation of the retraction.Journal of Health and Social Behavior56, 417–419 (2015)

work page 2015
[19]

Mandhane, P. J. Notice of Retraction: Hahn LM, et al. Post–COVID-19 Condition in Children. JAMA Pediatrics. 2023;177(11):1226-1228.JAMA Pediatrics(2024). URL https://jamanetwork.com/journals/jamapediatrics/fullarticle/2822489

work page arXiv 2023
[20]

& Mayer, S

Weber, T., Brandmaier, M., Schmidt, A. & Mayer, S. Significant Productivity Gains through Programming with Large Language Models.Proceedings of the ACM on Human-Computer Interaction8(2024)

work page 2024
[21]

& Rein, D

Becker, J., Rush, N., Barnes, B. & Rein, D. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

work page 2025
[22]

Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study

Stray, V., Brandtzæg, E. G., Wivestad, V. T., Barbala, A. & Moe, N. B. Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (2025). URL https://arxiv.org/abs/2509.20353v1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

& Vasilescu, B

He, H., Miller, C., Agarwal, S., K¨ astner, C. & Vasilescu, B. Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development (2025). URL http://arxiv.org/abs/2511.04427

work page arXiv 2025
[24]

& Blincoe, K

Fawzy, A., Tahir, A. & Blincoe, K. Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook - a Grey Literature Review1(2025). URL https://doi.org/10.1145/nnnnnnn.nnnnnnn

work page doi:10.1145/nnnnnnn.nnnnnnn 2025
[25]

Nguyen, S.et al.How Beginning Programmers and Code LLMs (Mis) read Each Other (2024)

work page 2024
[26]

It’s Weird That it Knows What I Want

Prather, J.et al.“It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Transactions on 37 Computer-Human Interaction31(2023). URL https://dl.acm.org/doi/10.1145/ 3617367

work page 2023
[27]

Panko, R. R. Two Experiments in Reducing Overconfidence in Spreadsheet De- velopment.Journal of Organizational and End User Computing19, 1–23 (2008). URL https://arxiv.org/abs/0804.0941v1

work page arXiv 2008
[28]

J.et al.The State of the Art in End-User Software Engineering

Ko, A. J.et al.The State of the Art in End-User Software Engineering

work page
[29]

K.et al.Data Carpentry: Workshops to Increase Data Literacy for Researchers.International Journal of Digital Curation10(2015)

Teal, T. K.et al.Data Carpentry: Workshops to Increase Data Literacy for Researchers.International Journal of Digital Curation10(2015)

work page 2015
[30]

U., Kanewala, U

Eisty, N. U., Kanewala, U. & Carver, J. C. Testing Research Software: An In- Depth Survey of Practices, Methods, and Tools (2025). URL http://arxiv.org/ abs/2501.17739

work page arXiv 2025
[31]

& Bieman, J

Kanewala, U. & Bieman, J. M. Testing scientific software: A systematic literature review.Information and Software Technology56, 1219–1232 (2014)

work page 2014
[32]

& Grunske, L

Vogel, T., Druskat, S., Scheidgen, M., Draxl, C. & Grunske, L. Challenges for verifying and validating scientific software in computational materials sci- ence.Proceedings - 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science, SE4Science 201925–32 (2019)

work page 2019
[33]

Ariful Islam Malik, M.et al.Peer Code Review in Research Software Develop- ment: The Research Software Engineer Perspective (2025)

work page 2025
[34]

Not everyone can use Git

Jay, C. “Not everyone can use Git”: Research Software Engineers’ recommenda- tions for scientist-centred software support (and what researchers really think of them). Tech. Rep. URL http://man.ac.uk/04Y6Bo]

work page
[35]

Jesse, K., Ahmed, T., Devanbu, P. T. & Morgan, E. Large Language Models and Simple, Stupid Bugs (2023)

work page 2023
[36]

Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma

Wang, Z.et al.Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models (2025). URL https://arxiv.org/abs/ 2406.08731v2

work page arXiv 2025
[37]

& Perkel, J

Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think (2023)

work page 2023
[38]

Caziot and B

Arroyo-Machado, W.et al.Generative AI and academic scientists in US universi- ties: Perception, experience, and adoption intentions.PLOS ONE20, e0330416 (2025). URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone. 0330416

work page doi:10.1371/journal.pone 2025
[39]

URL https://papers.ssrn.com/abstract= 38 5259847

Chugunova, M.et al.Who Uses AI in Research, and for What? Large-scale Survey Evidence from Germany (2025). URL https://papers.ssrn.com/abstract= 38 5259847

work page 2025
[40]

& Gerosa, M

Treude, C. & Gerosa, M. A. How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering (2025). URL https://arxiv. org/abs/2501.08774v2

work page arXiv 2025
[41]

inPolychoric and Polyserial Correlations(eds Kotz, S

Drasgow, F. inPolychoric and Polyserial Correlations(eds Kotz, S. & Johnson, N.)The Encyclopedia of Statistics, Vol. 7 68–74 (Wiley, 1986)

work page 1986
[42]

URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124

Forsgren, N.et al.The SPACE of Developer Productivity.Queue19(2021). URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124

work page doi:10.1145/3454122.3454124 2021
[43]

& Wyatt, J

Goddard, K., Roudsari, A. & Wyatt, J. C. Automation bias: A systematic review of frequency, effect mediators, and mitigators.Journal of the American Medical Informatics Association19, 121–127 (2012). URL https://dx.doi.org/10.1136/ amiajnl-2011-000089

work page 2012
[44]

Who Has Plots?

Paine, D. & Lee, C. P. “Who Has Plots?”: Contextualizing Scientific Software, Practice, and Visualizations1, 85 (2017). URL https://doi.org/10.1145/3134720

work page doi:10.1145/3134720 2017
[45]

Alpernas, K., Feldman, Y. M. & Peleg, H. The wonderful wizard of LoC: Paying attention to the man behind the curtain of lines-of-code metrics.Onward! 2020 - Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Co-located with SPLASH 2020146–156 (2020). URL https://dl.acm.org/...

work page doi:10.1145/3426428 2020
[46]

& Wills, C

Solla, M., Patel, A. & Wills, C. New metric for measuring programmer productiv- ity.ISCI 2011 - 2011 IEEE Symposium on Computers and Informatics177–182 (2011)

work page 2011
[47]

Self-Admitted GenAI Usage in Open-Source Software

Xiao, T.et al.Self-Admitted GenAI Usage in Open-Source Software (2025). URL https://arxiv.org/abs/2507.10422v2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

Here the GPT made a choice, and every choice can be biased

Prabhudesai, S.et al.“Here the GPT made a choice, and every choice can be biased”: How Students Critically Engage with LLMs through End-User Auditing Activity ACM Reference Format (2025). URL https://doi.org/10.1145/3706598. 3713714

work page doi:10.1145/3706598 2025
[49]

Lee, H. P. H.et al.The Impact of Generative AI on Critical Thinking: Self- Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers (2025)

work page 2025
[50]

R: A Language and Environment for Statistical Computing (2025)

R Core Team. R: A Language and Environment for Statistical Computing (2025). URL https://www.R-project.org/. 39

work page 2025
[51]

stargazer: Well-Formatted Regression and Summary Statistics Tables (2022)

Hlavac, M. stargazer: Well-Formatted Regression and Summary Statistics Tables (2022). URL https://CRAN.R-project.org/package=stargazer

work page 2022
[52]

& Simko, V

Wei, T. & Simko, V. corrplot: Visualization of a Correlation Matrix (2024). URL https://github.com/taiyun/corrplot. 40

work page 2024

[1] [1]

It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)

Hettrick, S. It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)

work page 2014

[2] [2]

Software Carpentry: Lessons learned (2016)

Wilson, G. Software Carpentry: Lessons learned (2016)

work page 2016

[3] [3]

C., Weber, N., Ram, K., Gesing, S

Carver, J. C., Weber, N., Ram, K., Gesing, S. & Katz, D. S. A survey of the state of the practice for research software in the United States.PeerJ Computer Science8(2022)

work page 2022

[4] [4]

& Katz, D

Nangia, U. & Katz, D. S. Surveying the US National Postdoctoral Association regarding software use and training in research (2017)

work page 2017

[5] [5]

URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864

Ziegler, A.et al.Productivity assessment of neural code completion (2022). URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864

work page doi:10.1145/3520312.3534864 2022

[6] [6]

URL https://arxiv.org/abs/2509.19708v1

Kumar, A.et al.Intuition to Evidence: Measuring AI’s True Impact on Developer Productivity (2025). URL https://arxiv.org/abs/2509.19708v1

work page arXiv 2025

[7] [7]

Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S

Prather, J.et al.The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers.Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1469–486 (2024). URL https://dl.acm. org/doi/10.1145/3632620.3671116

work page doi:10.1145/3632620.3671116 2024

[8] [8]

Moradi Dakhel, A.et al.GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software203, 111734 (2023)

work page 2023

[9] [9]

& Horvitz, E

Mozannar, H., Fourney, A., Bansal, G. & Horvitz, E. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming.Conference on Human Factors in Computing Systems - Proceedings(2024)

work page 2024

[10] [10]

& Glassman, E

Vaithilingam, P., Zhang, T. & Glassman, E. L. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models.Conference on Human Factors in Computing Systems - Proceedings (2022)

work page 2022

[11] [11]

Barke, S., James, M. B. & Polikarpova, N. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7(2023)

work page 2023

[12] [12]

How Scientists Use Large Language Models to Program.Conference on Human Factors in Computing Systems - Proceedings16 (2025)

O’Brien, G. How Scientists Use Large Language Models to Program.Conference on Human Factors in Computing Systems - Proceedings16 (2025). URL https: //dl.acm.org/doi/10.1145/3706598.3713668

work page doi:10.1145/3706598.3713668 2025

[13] [13]

Threats to scientific software from over-reliance on AI code assistants

O’Brien, G. Threats to scientific software from over-reliance on AI code assistants. Nature Computational Science 2025 5:95, 701–703 (2025). URL https://www. nature.com/articles/s43588-025-00845-2. 36

work page 2025

[14] [14]

PLOS Computational Biology 13(6), e1005510 (Jun 2017)

Wilson, G.et al.Good enough practices in scientific computing (2017). URL https://doi.org/10.1371/journal.pcbi.1005510

work page doi:10.1371/journal.pcbi.1005510 2017

[15] [15]

& Sankaranarayana, R

Nguyen-Hoan, L., Flint, S. & Sankaranarayana, R. A Survey of Scientific Software Development. Tech. Rep. (2010). URL http://apollo.anu.edu.au

work page 2010

[16] [16]

URL https://dl.acm.org/doi/10.1145/2063348

Prabhu, P.et al.A survey of the practice of computational science.State of the Practice Reports, SC’11(2011). URL https://dl.acm.org/doi/10.1145/2063348. 2063374

work page doi:10.1145/2063348 2011

[17] [17]

URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875

Retraction: Inferring the effects of cancer treatment: Divergent results from Early Breast Cancer Trialists’ Collaborative Group meta-analyses of randomized trials and observational data from SEER registries (Journal of Clinical Oncology (34) 803-809 (2016)).Journal of Clinical Oncology34, 3358–3359 (2016). URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875

work page doi:10.1200/jco.2016.69.0875 2016

[18] [18]

& Latham, K

Karraker, A. & Latham, K. Authors’ explanation of the retraction.Journal of Health and Social Behavior56, 417–419 (2015)

work page 2015

[19] [19]

Mandhane, P. J. Notice of Retraction: Hahn LM, et al. Post–COVID-19 Condition in Children. JAMA Pediatrics. 2023;177(11):1226-1228.JAMA Pediatrics(2024). URL https://jamanetwork.com/journals/jamapediatrics/fullarticle/2822489

work page arXiv 2023

[20] [20]

& Mayer, S

Weber, T., Brandmaier, M., Schmidt, A. & Mayer, S. Significant Productivity Gains through Programming with Large Language Models.Proceedings of the ACM on Human-Computer Interaction8(2024)

work page 2024

[21] [21]

& Rein, D

Becker, J., Rush, N., Barnes, B. & Rein, D. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

work page 2025

[22] [22]

Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study

Stray, V., Brandtzæg, E. G., Wivestad, V. T., Barbala, A. & Moe, N. B. Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (2025). URL https://arxiv.org/abs/2509.20353v1

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

& Vasilescu, B

He, H., Miller, C., Agarwal, S., K¨ astner, C. & Vasilescu, B. Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development (2025). URL http://arxiv.org/abs/2511.04427

work page arXiv 2025

[24] [24]

& Blincoe, K

Fawzy, A., Tahir, A. & Blincoe, K. Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook - a Grey Literature Review1(2025). URL https://doi.org/10.1145/nnnnnnn.nnnnnnn

work page doi:10.1145/nnnnnnn.nnnnnnn 2025

[25] [25]

Nguyen, S.et al.How Beginning Programmers and Code LLMs (Mis) read Each Other (2024)

work page 2024

[26] [26]

It’s Weird That it Knows What I Want

Prather, J.et al.“It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Transactions on 37 Computer-Human Interaction31(2023). URL https://dl.acm.org/doi/10.1145/ 3617367

work page 2023

[27] [27]

Panko, R. R. Two Experiments in Reducing Overconfidence in Spreadsheet De- velopment.Journal of Organizational and End User Computing19, 1–23 (2008). URL https://arxiv.org/abs/0804.0941v1

work page arXiv 2008

[28] [28]

J.et al.The State of the Art in End-User Software Engineering

Ko, A. J.et al.The State of the Art in End-User Software Engineering

work page

[29] [29]

K.et al.Data Carpentry: Workshops to Increase Data Literacy for Researchers.International Journal of Digital Curation10(2015)

Teal, T. K.et al.Data Carpentry: Workshops to Increase Data Literacy for Researchers.International Journal of Digital Curation10(2015)

work page 2015

[30] [30]

U., Kanewala, U

Eisty, N. U., Kanewala, U. & Carver, J. C. Testing Research Software: An In- Depth Survey of Practices, Methods, and Tools (2025). URL http://arxiv.org/ abs/2501.17739

work page arXiv 2025

[31] [31]

& Bieman, J

Kanewala, U. & Bieman, J. M. Testing scientific software: A systematic literature review.Information and Software Technology56, 1219–1232 (2014)

work page 2014

[32] [32]

& Grunske, L

Vogel, T., Druskat, S., Scheidgen, M., Draxl, C. & Grunske, L. Challenges for verifying and validating scientific software in computational materials sci- ence.Proceedings - 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science, SE4Science 201925–32 (2019)

work page 2019

[33] [33]

Ariful Islam Malik, M.et al.Peer Code Review in Research Software Develop- ment: The Research Software Engineer Perspective (2025)

work page 2025

[34] [34]

Not everyone can use Git

Jay, C. “Not everyone can use Git”: Research Software Engineers’ recommenda- tions for scientist-centred software support (and what researchers really think of them). Tech. Rep. URL http://man.ac.uk/04Y6Bo]

work page

[35] [35]

Jesse, K., Ahmed, T., Devanbu, P. T. & Morgan, E. Large Language Models and Simple, Stupid Bugs (2023)

work page 2023

[36] [36]

Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma

Wang, Z.et al.Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models (2025). URL https://arxiv.org/abs/ 2406.08731v2

work page arXiv 2025

[37] [37]

& Perkel, J

Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think (2023)

work page 2023

[38] [38]

Caziot and B

Arroyo-Machado, W.et al.Generative AI and academic scientists in US universi- ties: Perception, experience, and adoption intentions.PLOS ONE20, e0330416 (2025). URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone. 0330416

work page doi:10.1371/journal.pone 2025

[39] [39]

URL https://papers.ssrn.com/abstract= 38 5259847

Chugunova, M.et al.Who Uses AI in Research, and for What? Large-scale Survey Evidence from Germany (2025). URL https://papers.ssrn.com/abstract= 38 5259847

work page 2025

[40] [40]

& Gerosa, M

Treude, C. & Gerosa, M. A. How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering (2025). URL https://arxiv. org/abs/2501.08774v2

work page arXiv 2025

[41] [41]

inPolychoric and Polyserial Correlations(eds Kotz, S

Drasgow, F. inPolychoric and Polyserial Correlations(eds Kotz, S. & Johnson, N.)The Encyclopedia of Statistics, Vol. 7 68–74 (Wiley, 1986)

work page 1986

[42] [42]

URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124

Forsgren, N.et al.The SPACE of Developer Productivity.Queue19(2021). URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124

work page doi:10.1145/3454122.3454124 2021

[43] [43]

& Wyatt, J

Goddard, K., Roudsari, A. & Wyatt, J. C. Automation bias: A systematic review of frequency, effect mediators, and mitigators.Journal of the American Medical Informatics Association19, 121–127 (2012). URL https://dx.doi.org/10.1136/ amiajnl-2011-000089

work page 2012

[44] [44]

Who Has Plots?

Paine, D. & Lee, C. P. “Who Has Plots?”: Contextualizing Scientific Software, Practice, and Visualizations1, 85 (2017). URL https://doi.org/10.1145/3134720

work page doi:10.1145/3134720 2017

[45] [45]

Alpernas, K., Feldman, Y. M. & Peleg, H. The wonderful wizard of LoC: Paying attention to the man behind the curtain of lines-of-code metrics.Onward! 2020 - Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Co-located with SPLASH 2020146–156 (2020). URL https://dl.acm.org/...

work page doi:10.1145/3426428 2020

[46] [46]

& Wills, C

Solla, M., Patel, A. & Wills, C. New metric for measuring programmer productiv- ity.ISCI 2011 - 2011 IEEE Symposium on Computers and Informatics177–182 (2011)

work page 2011

[47] [47]

Self-Admitted GenAI Usage in Open-Source Software

Xiao, T.et al.Self-Admitted GenAI Usage in Open-Source Software (2025). URL https://arxiv.org/abs/2507.10422v2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[48] [48]

Here the GPT made a choice, and every choice can be biased

Prabhudesai, S.et al.“Here the GPT made a choice, and every choice can be biased”: How Students Critically Engage with LLMs through End-User Auditing Activity ACM Reference Format (2025). URL https://doi.org/10.1145/3706598. 3713714

work page doi:10.1145/3706598 2025

[49] [49]

Lee, H. P. H.et al.The Impact of Generative AI on Critical Thinking: Self- Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers (2025)

work page 2025

[50] [50]

R: A Language and Environment for Statistical Computing (2025)

R Core Team. R: A Language and Environment for Statistical Computing (2025). URL https://www.R-project.org/. 39

work page 2025

[51] [51]

stargazer: Well-Formatted Regression and Summary Statistics Tables (2022)

Hlavac, M. stargazer: Well-Formatted Regression and Summary Statistics Tables (2022). URL https://CRAN.R-project.org/package=stargazer

work page 2022

[52] [52]

& Simko, V

Wei, T. & Simko, V. corrplot: Visualization of a Correlation Matrix (2024). URL https://github.com/taiyun/corrplot. 40

work page 2024