A survey of generative AI adoption and perceived productivity among scientists who program
Pith reviewed 2026-05-16 20:17 UTC · model grok-4.3
The pith
The volume of AI-generated code scientists accept at once is the strongest predictor of their perceived productivity gains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through a survey of 868 scientific programmers, adoption of generative AI for coding is highest among students and less experienced programmers, with strong preference for conversational interfaces over developer-specific tools. Both inexperience and limited use of formal development practices are associated with greater perceived productivity, though these factors interact. The strongest predictor of perceived productivity is the number of lines of generated code typically accepted at once, indicating that scientific programmers may assess tool value primarily through generation volume rather than through subsequent validation or integration efforts.
What carries the argument
The association between perceived productivity and the typical number of lines of AI-generated code accepted per interaction, with interactions between programmer experience and use of development practices.
If this is right
- Inexperienced programmers report the largest perceived gains, suggesting AI tools may lower entry barriers into scientific coding.
- The interaction between experience and practices implies that adopting testing or version control can moderate perceived productivity differences.
- Emphasis on code generation volume over validation may increase later debugging costs in research projects.
- Field variation in adoption indicates that recommendations for AI tools should account for domain-specific workflows.
- Preference for general-purpose interfaces over specialized ones highlights usability as a primary driver of tool choice.
Where Pith is reading between the lines
- If high acceptance volumes correlate with lower code quality, scientific projects risk accumulating technical debt from unverified AI contributions.
- Tool interfaces could be redesigned to encourage review of smaller segments, potentially aligning perceived and actual productivity more closely.
- Educational programs for scientific computing might add explicit training on validating AI-generated code to offset the observed associations with inexperience.
- Longitudinal tracking of research outputs would test whether volume-based acceptance leads to faster discovery or hidden delays in verification.
Load-bearing premise
Self-reported perceived productivity accurately reflects actual productivity gains and is not driven by unmeasured confounders such as field-specific norms or individual motivation.
What would settle it
A controlled study that objectively tracks code correctness, debugging time, and overall project completion rates for AI-assisted versus non-assisted scientific tasks, then compares those measures directly to participants' self-reported productivity scores.
read the original abstract
Programming is essential to modern scientific research, yet most scientists report inadequate training for the software development their work demands. Generative AI tools capable of code generation may support scientific programmers, but user studies indicate risks of over-reliance, particularly among inexperienced users. We surveyed 868 scientists who program, examining adoption patterns, tool preferences, and factors associated with perceived productivity. Adoption is highest among students and less experienced programmers, with variation across fields. Scientific programmers overwhelmingly prefer general-purpose conversational interfaces like ChatGPT over developer-specific tools. Both inexperience and limited use of development practices (like testing, code review, and version control) are associated with greater perceived productivity -- but these factors interact, suggesting formal practices may partially compensate for inexperience. The strongest predictor of perceived productivity is the number of lines of generated code typically accepted at once. These findings suggest scientific programmers using generative AI may gauge productivity by code generation rather than validation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports findings from an online survey of 868 scientists who engage in programming as part of their research. It describes adoption rates of generative AI code generation tools, noting higher adoption among students and less experienced programmers, and a strong preference for general-purpose conversational interfaces such as ChatGPT. The analysis identifies associations between perceived productivity gains and factors including inexperience, limited adherence to software development practices (e.g., testing, code review, version control), and the typical number of lines of AI-generated code accepted per interaction. The strongest predictor is reported as the volume of code accepted at once, with interactions suggesting that formal practices may mitigate risks associated with inexperience. The authors conclude that scientific programmers may be assessing productivity primarily through code generation volume rather than through validation or integration processes.
Significance. If the reported associations are robust, the survey provides timely empirical data on how generative AI is being integrated into scientific workflows, highlighting potential disparities in adoption and perceived benefits across experience levels. This could inform training programs and tool design for scientific computing. However, the reliance on unvalidated self-reported measures means the significance is primarily descriptive of perceptions rather than causal impacts on actual productivity.
major comments (3)
- Abstract and Results: The identification of the number of lines of generated code accepted at once as the strongest predictor of perceived productivity rests on regression models using a single unvalidated self-reported Likert item as the outcome; no correlation with external proxies (commit velocity, bug rates, or project completion times) is reported, which is load-bearing for the central claim that this metric reflects genuine productivity differences rather than reporting bias.
- Methods: The cross-sectional survey design reports associations between inexperience, limited development practices, and higher perceived productivity without documented controls for confounders such as motivation, field norms, or self-selection into the sample; this omission directly affects the interpretability of the interaction terms highlighted in the abstract.
- Results: The claim that formal development practices partially compensate for inexperience is derived from interaction coefficients on subjective responses, yet the manuscript provides no sensitivity analyses or robustness checks against overestimation by inexperienced respondents, undermining the load-bearing interpretation offered in the discussion.
minor comments (2)
- Abstract: Include the exact sample size (868) and any response rate information to give readers immediate context on survey scale and potential non-response bias.
- Tables/Figures: Ensure all regression output tables report exact coefficients, standard errors, confidence intervals, and p-values for the key predictors and interaction terms rather than summary statements alone.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our survey of generative AI adoption among scientific programmers. The feedback underscores key limitations of self-reported, cross-sectional data, which we address point by point below. We propose targeted revisions to enhance transparency while noting constraints imposed by the original survey design.
read point-by-point responses
-
Referee: Abstract and Results: The identification of the number of lines of generated code accepted at once as the strongest predictor of perceived productivity rests on regression models using a single unvalidated self-reported Likert item as the outcome; no correlation with external proxies (commit velocity, bug rates, or project completion times) is reported, which is load-bearing for the central claim that this metric reflects genuine productivity differences rather than reporting bias.
Authors: We agree that the productivity outcome relies on a single unvalidated self-reported Likert item and that no external proxies were collected. The survey focused on perceptions and did not include objective measures such as commit velocity or bug rates. We will revise the abstract, results, and discussion to explicitly frame all findings as relating to perceived productivity, discuss potential reporting biases, and add a dedicated limitations subsection on this issue. revision: partial
-
Referee: Methods: The cross-sectional survey design reports associations between inexperience, limited development practices, and higher perceived productivity without documented controls for confounders such as motivation, field norms, or self-selection into the sample; this omission directly affects the interpretability of the interaction terms highlighted in the abstract.
Authors: The cross-sectional design inherently limits causal claims and full confounder control. We will expand the methods section to detail the demographic and field controls that were included in the regressions and revise the discussion to explicitly note the absence of controls for motivation or self-selection as a limitation on interpreting the interaction terms. revision: partial
-
Referee: Results: The claim that formal development practices partially compensate for inexperience is derived from interaction coefficients on subjective responses, yet the manuscript provides no sensitivity analyses or robustness checks against overestimation by inexperienced respondents, undermining the load-bearing interpretation offered in the discussion.
Authors: We will add sensitivity analyses and robustness checks to the results section, including stratification by experience level and alternative model specifications to evaluate potential overestimation. These additions will be reported to support the interpretation of the interaction effects. revision: yes
- We cannot add correlations with external productivity proxies (e.g., commit velocity or bug rates) because the survey did not collect such objective data.
Circularity Check
No circularity: observational survey reports direct associations without derivations or self-referential reductions
full rationale
The paper is a cross-sectional survey of 868 respondents analyzing adoption patterns and associations with a single self-reported perceived productivity item via regression. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes appear in the derivation chain. All reported predictors (e.g., lines of code accepted, inexperience, development practices) and their interactions are computed directly from the survey responses; the analysis does not reduce any claimed result to its own inputs by construction. This is the standard non-circular outcome for descriptive survey work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-reported survey responses accurately reflect actual tool usage, experience levels, and perceived productivity
Reference graph
Works this paper leans on
-
[1]
It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)
Hettrick, S. It’s impossible to conduct research without software, say 7 out of 10 UK researchers (2014)
work page 2014
-
[2]
Software Carpentry: Lessons learned (2016)
Wilson, G. Software Carpentry: Lessons learned (2016)
work page 2016
-
[3]
C., Weber, N., Ram, K., Gesing, S
Carver, J. C., Weber, N., Ram, K., Gesing, S. & Katz, D. S. A survey of the state of the practice for research software in the United States.PeerJ Computer Science8(2022)
work page 2022
- [4]
-
[5]
URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864
Ziegler, A.et al.Productivity assessment of neural code completion (2022). URL https://dl.acm.org/doi/abs/10.1145/3520312.3534864
-
[6]
URL https://arxiv.org/abs/2509.19708v1
Kumar, A.et al.Intuition to Evidence: Measuring AI’s True Impact on Developer Productivity (2025). URL https://arxiv.org/abs/2509.19708v1
-
[7]
Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S
Prather, J.et al.The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers.Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1469–486 (2024). URL https://dl.acm. org/doi/10.1145/3632620.3671116
-
[8]
Moradi Dakhel, A.et al.GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software203, 111734 (2023)
work page 2023
-
[9]
Mozannar, H., Fourney, A., Bansal, G. & Horvitz, E. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming.Conference on Human Factors in Computing Systems - Proceedings(2024)
work page 2024
-
[10]
Vaithilingam, P., Zhang, T. & Glassman, E. L. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models.Conference on Human Factors in Computing Systems - Proceedings (2022)
work page 2022
-
[11]
Barke, S., James, M. B. & Polikarpova, N. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7(2023)
work page 2023
-
[12]
O’Brien, G. How Scientists Use Large Language Models to Program.Conference on Human Factors in Computing Systems - Proceedings16 (2025). URL https: //dl.acm.org/doi/10.1145/3706598.3713668
-
[13]
Threats to scientific software from over-reliance on AI code assistants
O’Brien, G. Threats to scientific software from over-reliance on AI code assistants. Nature Computational Science 2025 5:95, 701–703 (2025). URL https://www. nature.com/articles/s43588-025-00845-2. 36
work page 2025
-
[14]
PLOS Computational Biology 13(6), e1005510 (Jun 2017)
Wilson, G.et al.Good enough practices in scientific computing (2017). URL https://doi.org/10.1371/journal.pcbi.1005510
-
[15]
Nguyen-Hoan, L., Flint, S. & Sankaranarayana, R. A Survey of Scientific Software Development. Tech. Rep. (2010). URL http://apollo.anu.edu.au
work page 2010
-
[16]
URL https://dl.acm.org/doi/10.1145/2063348
Prabhu, P.et al.A survey of the practice of computational science.State of the Practice Reports, SC’11(2011). URL https://dl.acm.org/doi/10.1145/2063348. 2063374
-
[17]
URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875
Retraction: Inferring the effects of cancer treatment: Divergent results from Early Breast Cancer Trialists’ Collaborative Group meta-analyses of randomized trials and observational data from SEER registries (Journal of Clinical Oncology (34) 803-809 (2016)).Journal of Clinical Oncology34, 3358–3359 (2016). URL https: //ascopubs.org/doi/10.1200/JCO.2016.69.0875
-
[18]
Karraker, A. & Latham, K. Authors’ explanation of the retraction.Journal of Health and Social Behavior56, 417–419 (2015)
work page 2015
- [19]
-
[20]
Weber, T., Brandmaier, M., Schmidt, A. & Mayer, S. Significant Productivity Gains through Programming with Large Language Models.Proceedings of the ACM on Human-Computer Interaction8(2024)
work page 2024
- [21]
-
[22]
Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study
Stray, V., Brandtzæg, E. G., Wivestad, V. T., Barbala, A. & Moe, N. B. Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (2025). URL https://arxiv.org/abs/2509.20353v1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
He, H., Miller, C., Agarwal, S., K¨ astner, C. & Vasilescu, B. Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development (2025). URL http://arxiv.org/abs/2511.04427
-
[24]
Fawzy, A., Tahir, A. & Blincoe, K. Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook - a Grey Literature Review1(2025). URL https://doi.org/10.1145/nnnnnnn.nnnnnnn
-
[25]
Nguyen, S.et al.How Beginning Programmers and Code LLMs (Mis) read Each Other (2024)
work page 2024
-
[26]
It’s Weird That it Knows What I Want
Prather, J.et al.“It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Transactions on 37 Computer-Human Interaction31(2023). URL https://dl.acm.org/doi/10.1145/ 3617367
work page 2023
- [27]
-
[28]
J.et al.The State of the Art in End-User Software Engineering
Ko, A. J.et al.The State of the Art in End-User Software Engineering
-
[29]
Teal, T. K.et al.Data Carpentry: Workshops to Increase Data Literacy for Researchers.International Journal of Digital Curation10(2015)
work page 2015
-
[30]
Eisty, N. U., Kanewala, U. & Carver, J. C. Testing Research Software: An In- Depth Survey of Practices, Methods, and Tools (2025). URL http://arxiv.org/ abs/2501.17739
-
[31]
Kanewala, U. & Bieman, J. M. Testing scientific software: A systematic literature review.Information and Software Technology56, 1219–1232 (2014)
work page 2014
-
[32]
Vogel, T., Druskat, S., Scheidgen, M., Draxl, C. & Grunske, L. Challenges for verifying and validating scientific software in computational materials sci- ence.Proceedings - 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science, SE4Science 201925–32 (2019)
work page 2019
-
[33]
Ariful Islam Malik, M.et al.Peer Code Review in Research Software Develop- ment: The Research Software Engineer Perspective (2025)
work page 2025
-
[34]
Jay, C. “Not everyone can use Git”: Research Software Engineers’ recommenda- tions for scientist-centred software support (and what researchers really think of them). Tech. Rep. URL http://man.ac.uk/04Y6Bo]
-
[35]
Jesse, K., Ahmed, T., Devanbu, P. T. & Morgan, E. Large Language Models and Simple, Stupid Bugs (2023)
work page 2023
-
[36]
Wang, Z.et al.Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models (2025). URL https://arxiv.org/abs/ 2406.08731v2
-
[37]
Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think (2023)
work page 2023
-
[38]
Arroyo-Machado, W.et al.Generative AI and academic scientists in US universi- ties: Perception, experience, and adoption intentions.PLOS ONE20, e0330416 (2025). URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone. 0330416
-
[39]
URL https://papers.ssrn.com/abstract= 38 5259847
Chugunova, M.et al.Who Uses AI in Research, and for What? Large-scale Survey Evidence from Germany (2025). URL https://papers.ssrn.com/abstract= 38 5259847
work page 2025
-
[40]
Treude, C. & Gerosa, M. A. How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering (2025). URL https://arxiv. org/abs/2501.08774v2
-
[41]
inPolychoric and Polyserial Correlations(eds Kotz, S
Drasgow, F. inPolychoric and Polyserial Correlations(eds Kotz, S. & Johnson, N.)The Encyclopedia of Statistics, Vol. 7 68–74 (Wiley, 1986)
work page 1986
-
[42]
URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124
Forsgren, N.et al.The SPACE of Developer Productivity.Queue19(2021). URL https://spawn-queue.acm.org/doi/10.1145/3454122.3454124
-
[43]
Goddard, K., Roudsari, A. & Wyatt, J. C. Automation bias: A systematic review of frequency, effect mediators, and mitigators.Journal of the American Medical Informatics Association19, 121–127 (2012). URL https://dx.doi.org/10.1136/ amiajnl-2011-000089
work page 2012
-
[44]
Paine, D. & Lee, C. P. “Who Has Plots?”: Contextualizing Scientific Software, Practice, and Visualizations1, 85 (2017). URL https://doi.org/10.1145/3134720
-
[45]
Alpernas, K., Feldman, Y. M. & Peleg, H. The wonderful wizard of LoC: Paying attention to the man behind the curtain of lines-of-code metrics.Onward! 2020 - Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Co-located with SPLASH 2020146–156 (2020). URL https://dl.acm.org/...
-
[46]
Solla, M., Patel, A. & Wills, C. New metric for measuring programmer productiv- ity.ISCI 2011 - 2011 IEEE Symposium on Computers and Informatics177–182 (2011)
work page 2011
-
[47]
Self-Admitted GenAI Usage in Open-Source Software
Xiao, T.et al.Self-Admitted GenAI Usage in Open-Source Software (2025). URL https://arxiv.org/abs/2507.10422v2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
Here the GPT made a choice, and every choice can be biased
Prabhudesai, S.et al.“Here the GPT made a choice, and every choice can be biased”: How Students Critically Engage with LLMs through End-User Auditing Activity ACM Reference Format (2025). URL https://doi.org/10.1145/3706598. 3713714
-
[49]
Lee, H. P. H.et al.The Impact of Generative AI on Critical Thinking: Self- Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers (2025)
work page 2025
-
[50]
R: A Language and Environment for Statistical Computing (2025)
R Core Team. R: A Language and Environment for Statistical Computing (2025). URL https://www.R-project.org/. 39
work page 2025
-
[51]
stargazer: Well-Formatted Regression and Summary Statistics Tables (2022)
Hlavac, M. stargazer: Well-Formatted Regression and Summary Statistics Tables (2022). URL https://CRAN.R-project.org/package=stargazer
work page 2022
-
[52]
Wei, T. & Simko, V. corrplot: Visualization of a Correlation Matrix (2024). URL https://github.com/taiyun/corrplot. 40
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.