pith. sign in

arxiv: 2410.01026 · v2 · submitted 2024-10-01 · 💻 cs.SE · cs.HC

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

Pith reviewed 2026-05-23 19:56 UTC · model grok-4.3

classification 💻 cs.SE cs.HC
keywords LLMprogramming tasksuser studieshuman-LLM interactioncode generationsurveynon-determinism
0
0 comments X

The pith

Survey of LLM programming studies finds high variability from non-determinism in humans and models

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews user studies on how people interact with LLMs during programming tasks. It looks at the kinds of requests made, how users complete tasks with the models, and the benefits and weaknesses that result. The review also identifies factors related to the human, the LLM, or their combination that influence personal improvement and task success. The key observation is that interactions differ greatly because both humans and LLMs behave in non-deterministic ways. This variability calls for more detailed investigation into the patterns of these interactions, along with some practical advice for users and researchers.

Core claim

Drawing from user studies, the survey identifies variability in human-LLM interactions in programming tasks stemming from the non-deterministic nature of both humans and LLMs, which highlights the need for a deeper understanding of these interaction patterns and leads to practical suggestions for researchers and programmers.

What carries the argument

Analysis of user interaction behaviors with LLMs, including request types, task completion strategies, benefits, weaknesses, and factors affecting human enhancement and task performance.

If this is right

  • LLMs offer capabilities for code generation but with mixed impacts on task performance.
  • Factors from human, LLM, or interaction affect enhancement and performance.
  • Deeper understanding of interaction patterns is needed due to variability.
  • Practical suggestions can guide researchers and programmers in using LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool designers might create interfaces that help stabilize interactions despite non-determinism.
  • Programmers could benefit from training on effective prompting strategies.
  • This suggests value in future studies comparing different LLM versions or human expertise levels.

Load-bearing premise

The user studies examined provide a representative sample sufficient to identify the common types of requests, strategies, benefits, weaknesses, and influencing factors.

What would settle it

A large new user study that demonstrates highly consistent human-LLM interaction patterns in programming tasks would challenge the highlighted variability.

Figures

Figures reproduced from arXiv: 2410.01026 by Deborah Etsenake, Meiyappan Nagappan.

Figure 1
Figure 1. Figure 1: Human enhancement themes categorized by the number of papers reporting positive, neutral, and negative effects. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The LLM response Evaluation metric results as examined in the papers and grouped them into number of papers [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
read the original abstract

Large Language Models (LLMs) are transforming programming practices, offering significant capabilities for code generation activities. While researchers have explored the potential of LLMs in various domains, this paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks. We first examined the user interaction behaviors with LLMs observed in these studies, from the types of requests made to task completion strategies. Additionally, our analysis reveals both benefits and weaknesses of LLMs showing mixed effects on the human and task. Lastly, we looked into what factors from the human, LLM or the interaction of both, affect the human's enhancement as well as the task performance. Our findings highlight the variability in human-LLM interactions due to the non-deterministic nature of both parties (humans and LLMs), underscoring the need for a deeper understanding of these interaction patterns. We conclude by providing some practical suggestions for researchers as well as programmers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. This paper is a literature survey examining user studies on LLM use in programming tasks. It reviews observed interaction behaviors (request types and task completion strategies), identifies benefits and weaknesses with mixed effects on humans and tasks, analyzes factors influencing human enhancement and performance, highlights variability due to non-determinism in both humans and LLMs, and offers practical suggestions for researchers and programmers.

Significance. If the underlying study selection and synthesis prove rigorous, the survey could usefully consolidate findings on human-LLM dynamics in programming, drawing attention to interaction variability and the need for further research while providing actionable suggestions.

major comments (1)
  1. [Methodology] Methodology section: No details are provided on the literature search strategy (databases, keywords, time frame), inclusion/exclusion criteria, number of papers screened versus included, quality assessment, or handling of contradictory results. This is load-bearing for the central claim that the examined user studies reveal representative patterns of variability due to non-determinism, as the abstract and synthesis rest on the assumption that these studies are sufficient and unbiased.
minor comments (1)
  1. [Abstract] Abstract: Adding a brief statement on the number of studies reviewed and the review protocol would improve transparency without lengthening the abstract substantially.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights an important area for strengthening the paper. We agree that the methodology requires additional transparency to support the survey's claims. Our point-by-point response follows.

read point-by-point responses
  1. Referee: [Methodology] Methodology section: No details are provided on the literature search strategy (databases, keywords, time frame), inclusion/exclusion criteria, number of papers screened versus included, quality assessment, or handling of contradictory results. This is load-bearing for the central claim that the examined user studies reveal representative patterns of variability due to non-determinism, as the abstract and synthesis rest on the assumption that these studies are sufficient and unbiased.

    Authors: We agree that the current manuscript does not provide sufficient methodological detail. In the revised version we will add a dedicated Methodology subsection that explicitly describes: the databases and repositories searched (ACM Digital Library, IEEE Xplore, arXiv, Google Scholar), the keyword strings and Boolean queries employed, the time frame (primarily 2022–2024), the inclusion/exclusion criteria (empirical user studies on LLM-assisted programming tasks, English-language, peer-reviewed or preprints with human-subject data), a PRISMA flow diagram reporting screened, eligible, and included papers, any quality or risk-of-bias assessment applied, and the approach taken to synthesize and reconcile contradictory findings. These additions will directly address the concern about representativeness and strengthen the evidential basis for the reported patterns of variability. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive survey with no derivations or self-referential claims

full rationale

This is a literature survey paper that synthesizes findings from external user studies on LLM use in programming. It contains no equations, predictions, fitted parameters, uniqueness theorems, or ansatzes. The central claim about interaction variability is an interpretive summary of cited studies rather than a derivation that reduces to its own inputs by construction. No self-citation load-bearing steps exist, and the paper does not rename known results or smuggle ansatzes. The derivation chain is absent, making circularity analysis inapplicable; the paper is self-contained as a descriptive review.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey paper with no new mathematical models, parameters, axioms, or invented entities. It relies on the body of existing user studies in the field.

pith-pipeline@v0.9.0 · 5697 in / 1123 out tokens · 26056 ms · 2026-05-23T19:56:44.171636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media

    cs.CL 2026-05 unverdicted novelty 5.0

    Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    IEEE Standard for a Software Quality Metrics Methodology

    1998. IEEE Standard for a Software Quality Metrics Methodology. https: //standards.ieee.org/standard/1061-1998.html

  2. [2]

    IEEE Standard for Software Quality Assurance Processes

    2014. IEEE Standard for Software Quality Assurance Processes. https:// standards.ieee.org/standard/730-2014.html

  3. [3]

    Mathieu Acher, José Galindo Duarte, and Jean-Marc Jézéquel. 2023. On Program- ming Variability with Large Language Model-based Assistant. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 8–14. https://doi.org/10.1145/35790...

  4. [4]

    Mathieu Acher and Jabier Martinez. 2023. Generative AI for Reengineering Variants into Software Product Lines: An Experience Report. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 57–66. https://doi.org/10.1145/3579028.3609016

  5. [5]

    Santiago Aillon, Alejandro Garcia, Nicolas Velandia, Daniel Zarate, and Pedro Wightman. 2023. Empirical evaluation of automated code generation for mobile Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA applications by AI tools. In 2023 IEEE Colombian Caribbean Conferen...

  6. [6]

    Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages. https://doi.org/10.1145/355134...

  7. [7]

    Glassman

    Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, and Elena L. Glassman. 2024. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 304, 18 pa...

  8. [8]

    Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, and Dhruv Kumar. 2024. Analyzing LLM Usage in an Advanced Computing Class in India. arXiv:2404.04603 [cs.HC] https://arxiv. org/abs/2404.04603

  9. [9]

    Owura Asare, Meiyappan Nagappan, and N. Asokan. 2024. A User-centered Security Evaluation of Copilot. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 158, 11 pages. https: //doi.org/10.1145/3597503.3639154

  10. [10]

    James, and Nadia Polikarpova

    Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10. 1145/3586030

  11. [11]

    Like a Nesting Doll

    Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Little- field, Sami Sarsa, and Stephen MacNeil. 2024. "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students using Large Language Models. arXiv:2403.09409 [cs.HC] https://arxiv.org/abs/2403.09409

  12. [12]

    Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 20, 6 (jan 2023), 35–57. https://doi.org/10.1145/3582083

  13. [13]

    Courtni Byun, Piper Vasicek, and Kevin Seppi. 2023. Dispensing with Humans in Human-Computer Interaction Research. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 413, 26 pages. https://doi.org/10.1145/3544549.3582749

  14. [14]

    Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment. arXiv:2402.05636 [cs.SE] https://arxiv.org/abs/2402.05636

  15. [15]

    Bei Chen, Daoguang Zan, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2023. Large Language Models Meet NL2Code: A Survey. InACL 2023. https://www.microsoft.com/en-us/research/publication/ large-language-models-meet-nl2code-a-survey/

  16. [16]

    Bhavya Chopra, Yasharth Bajpai, Param Biyani, Gustavo Soares, Arjun Rad- hakrishna, Chris Parnin, and Sumit Gulwani. 2024. Exploring Interaction Pat- terns for Debugging: Enhancing Conversational Capabilities of AI-assistants. arXiv:2402.06229 [cs.HC] https://arxiv.org/abs/2402.06229

  17. [17]

    Bhavya Chopra, Ananya Singha, Anna Fariha, Sumit Gulwani, Chris Parnin, Ashish Tiwari, and Austin Z. Henley. 2023. Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities. arXiv:2310.16164 [cs.HC] https://arxiv.org/abs/2310.16164

  18. [18]

    Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, and Anita Sarma. 2024. How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering. In Proceedings of the IEEE/ACM 46th Interna- tional Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Asso- ciation for Computing Machinery, New York, NY, USA, Arti...

  19. [19]

    Bruno Pereira Cipriano and Pedro Alves. 2023. GPT-3 vs Object Oriented Programming Assignments: An Experience Report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (, Turku, Finland,) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 61–67. https://doi.org/10.1145/3587102.3588814

  20. [20]

    Computer Emergency Response Team. [n. d.]. CERT Secure Coding Standards. https://www.securecoding.cert.org/

  21. [21]

    Javier Cámara, Javier Troya, Luis Burgueño, et al. 2023. On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML. Software and Systems Modeling 22, 3 (2023), 781–793. https://doi.org/10.1007/ s10270-023-01105-5

  22. [22]

    Smith IV au2, Max Fowler, James Prather, Brett A

    Paul Denny, David H. Smith IV au2, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining Code with a Purpose: An In- tegrated Approach for Developing Code Comprehension and Prompting Skills. arXiv:2403.06050 [cs.HC] https://arxiv.org/abs/2403.06050

  23. [23]

    Becker, and Brent N

    Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, ...

  24. [24]

    Amir Dirin and Teemu Laine. 2024. Examining the Utilization of Artificial Intelligence Tools by Students in Software Engineering Projects. In CSEDU24. https://doi.org/10.5220/0012729400003693

  25. [25]

    Dreyfus and Hubert Dreyfus

    S.E. Dreyfus and Hubert Dreyfus. 1980. A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. , 22 pages. https://apps.dtic.mil/sti/citations/ADA084551#:~:text=In%20acquiring%20a% 20skill%20by,%2C%20proficiency%2C%20expertise%20and%20mastery

  26. [26]

    Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayan- swamy, Joseph Breda, Xin Liu, Shwetak Patel, and Vikram Iyer. 2023. Exploring and Characterizing Large Language Models For Embedded System Development and Debugging. arXiv:2307.03817 [cs.SE]

  27. [27]

    Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, and Samer Al-Kiswany. 2023. Measuring the Runtime Performance of Code Produced with GitHub Copilot. arXiv:2305.06439 [cs.SE] https://arxiv.org/abs/2305.06439

  28. [28]

    Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shuvendu K. Lahiri. 2024. LLM-based Test-driven Interactive Code Gen- eration: User Study and Empirical Evaluation. arXiv:2404.10100 [cs.SE] https://arxiv.org/abs/2404.10100

  29. [29]

    Felicia Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2023. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv:2310.09235 [cs.HC]

  30. [30]

    Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Auto- mated Android Bug Replay with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 67, 13 pages. https://doi.org/10.1145/3597503.3608137

  31. [31]

    James, Nadia Polikar- pova, and Sorin Lerner

    Kasra Ferdowsi, Ruanqianqian Huang, Michael B. James, Nadia Polikar- pova, and Sorin Lerner. 2023. Live Exploration of AI-Generated Programs. arXiv:2306.09541 [cs.HC]

  32. [32]

    Becker, Andrew Luxton-Reilly, and James Prather

    James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, ...

  33. [34]

    Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study. In 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) . 319–321. https://doi. org/10.1145/3510454.3522684

  34. [35]

    Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages. https://doi.org/10.1145/352...

  35. [36]

    Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,)(CHI ’22). Associatio...

  36. [37]

    Yong Jing, Hao Wang, Xinyu Chen, et al . 2024. What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi- experimental study. Humanities and Social Sciences Communications 11, 1 (2024),

  37. [38]

    https://doi.org/10.1057/s41599-024-02751-w

  38. [39]

    Johnson, William Doss, and Christopher M

    Daniel M. Johnson, William Doss, and Christopher M. Estepp. 2024. Using ChatGPT with Novice Arduino Programmers: Effects on Performance, Interest, Self-Efficacy, and Programming Ability. Journal of Research in Technical Careers 8, 1 (2024). https://doi.org/10.9741/2578-2118.1152

  39. [40]

    Breanna Jury, Angela Lorusso, Juho Leinonen, Paul Denny, and Andrew Luxton- Reilly. 2024. Evaluating LLM-generated Worked Examples in an Introductory Programming Course. In Proceedings of the 26th Australasian Computing Educa- tion Conference (Sydney, NSW, Australia)(ACE ’24). Association for Computing Machinery, New York, NY, USA, 77–86. https://doi.org/...

  40. [41]

    Ulas Berk Karli, Juo-Tung Chen, Victor Nikhil Antony, and Chien-Ming Huang

  41. [42]

    In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24)

    Alchemist: LLM-Aided End-User Development of Robot Applications. In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24). Association for Computing Machinery, New York, NY, USA, 361–370. https://doi.org/10.1145/3610977.3634969 Conference’17, July 2017, Washington, DC, USA Etsenake and Nagappan

  42. [43]

    Ericson, David Weintrop, and Tovi Grossman

    Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery,...

  43. [44]

    Ericson, David Weintrop, and Tovi Grossman

    Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. How Novices Use LLM-Based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. arXiv:2309.14049 [cs.HC]

  44. [45]

    Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Zachary Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Asso...

  45. [46]

    Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco Gomes de Oliveira Neto. 2024. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice. arXiv:2404.14901 [cs.SE] https://arxiv.org/abs/2404.14901

  46. [47]

    Nam Wook Kim, Hyung-Kwon Ko, Grace Myers, and Benjamin Bach

  47. [48]

    arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

    ChatGPT in Data Visualization Education: A Student Perspective. arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

  48. [49]

    Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931

  49. [50]

    Tomaž Kosar, Dragana Ostojić, Yu David Liu, and Marjan Mernik. 2024. Com- puter Science Education in ChatGPT Era: Experiences from an Experiment in a Programming Course for Novice Programmers. Mathematics 12, 5 (2024). https://doi.org/10.3390/math12050629

  50. [51]

    Kimio Kuramitsu, Yui Obara, Miyu Sato, and Momoka Obara. 2023. KOGI: A Seamless Integration of ChatGPT into Jupyter Environments for Programming Education. In Proceedings of the 2023 ACM SIGPLAN International Symposium on SPLASH-E (Cascais, Portugal) (SPLASH-E 2023). Association for Computing Machinery, New York, NY, USA, 50–59. https://doi.org/10.1145/36...

  51. [52]

    Mark Liffiton, Brad E Sheese, Jaromir Savelka, and Paul Denny. 2024. Code- Help: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. In Proceedings of the 23rd Koli Calling International Con- ference on Computing Education Research (Koli, Finland) (Koli Calling ’23). As- sociation for Computing Machinery, New York, NY,...

  52. [53]

    Jinrun Liu, Xinyu Tang, Linlin Li, Panpan Chen, and Yepang Liu. 2023. Which is a better programming assistant? A comparative study between chatgpt and stack overflow. arXiv:2308.13851 [cs.SE] https://arxiv.org/abs/2308.13851

  53. [54]

    Jiaqi Liu, Fengming Zhang, Xin Zhang, Zhiwen Yu, Liang Wang, Yao Zhang, and Bin Guo. 2024. hmCodeTrans: Human–Machine Interactive Code Translation. IEEE Transactions on Software Engineering 50, 5 (2024), 1163–1181. https: //doi.org/10.1109/TSE.2024.3379583

  54. [55]

    What It Wants Me To Say

    Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Con- ference on Human Factors in Computing Systems (, Hamburg, Germany,) ...

  55. [56]

    Qianou Ma, Hua Shen, Kenneth Koedinger, and Tongshuang Wu. 2023. HypoCompass: Large-Language-Model-based Tutor for Hypothesis Construc- tion in Debugging for Novices. arXiv:2310.05292 [cs.HC]

  56. [57]

    Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (, Toronto ON, Canada,) (SIGCSE 2023)...

  57. [58]

    Desmarais, and Zhen Ming (Jack) Jiang

    Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software 203 (2023), 111734. https://doi.org/10.1016/j.jss.2023.111734

  58. [59]

    Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2023. Read- ing Between the Lines: Modeling User Behavior and Costs in AI-Assisted Pro- gramming. arXiv:2210.14306 [cs.SE]

  59. [60]

    Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 97, 13 pages. https://doi.org/10.1145/359750...

  60. [61]

    Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, and Ingo Weber. 2024. LLMs for Science: Usage for Code Generation and Data Analysis. arXiv:2311.16733 [cs.SE] https://arxiv.org/abs/2311.16733

  61. [62]

    Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. 2024. How Beginning Programmers and Code LLMs (Mis)read Each Other. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 651, 26 pages. ...

  62. [63]

    Sanghak Oh, Kiho Lee, Seonhye Park, Doowon Kim, and Hyoungshick Kim

  63. [64]

    arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

    Poisoned ChatGPT Finds Work for Idle Hands: Exploring Develop- ers’ Coding Practices with Insecure Suggestions from Poisoned AI Models. arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

  64. [65]

    Abdessalam Ouaazki, Kristoffer Bergram, and Adrian Holzer. 2023. Leverag- ing ChatGPT to Enhance Computational Thinking Learning Experiences. In 2023 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE). 1–7. https://doi.org/10.1109/TALE56641.2023.10398358

  65. [66]

    Eng Lieh Ouh, Benjamin Kok Siew Gan, Kyong Jin Shim, and Swavek Wlod- kowski. 2023. ChatGPT, Can You Generate Solutions for My Coding Ex- ercises? An Evaluation on Its Effectiveness in an Undergraduate Java Pro- gramming Course.. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE ...

  66. [67]

    Omer Said Ozturk, Emre Ekmekcioglu, Orcun Cetin, Budi Arief, and Julio Hernandez-Castro. 2023. New Tricks to Old Codes: Can AI Chatbots Re- place Static Code Analysis Tools?. In Proceedings of the 2023 European In- terdisciplinary Cybersecurity Conference (Stavanger, Norway) (EICC ’23). As- sociation for Computing Machinery, New York, NY, USA, 13–18. http...

  67. [68]

    Patton, David Y

    Evan W. Patton, David Y. J. Kim, Ashley Granquist, Robin Liu, Arianna Scott, Jennet Zamanova, and Harold Abelson. 2024. Aptly: Making Mobile Apps from Natural Language. arXiv:2405.00229 [cs.HC] https://arxiv.org/abs/2405.00229

  68. [69]

    Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE]

  69. [70]

    Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. https://doi.org/10.1145/3576915.3623157

  70. [71]

    Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi

  71. [72]

    In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023)

    Generating Programs Trivially: Student Use of Large Language Models. In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023). Association for Computing Machinery, New York, NY, USA, 126–132. https://doi.org/10.1145/3576882.3617921

  72. [73]

    It’s Weird That It Knows What I Want

    James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (aug 2023). https://doi.org/10.1145/3617367 Just Accepted

  73. [74]

    Kevin Pu, Jim Yang, Angel Yuan, Minyi Ma, Rui Dong, Xinyu Wang, Yan Chen, and Tovi Grossman. 2023. DiLogics: Creating Web Automation Programs with Diverse Logics. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (, San Francisco, CA, USA,) (UIST ’23). As- sociation for Computing Machinery, New York, NY, USA, Articl...

  74. [75]

    Crystal Qian and James Wexler. 2024. Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 370–384. https://doi.org/10.1145/3640543.3645198

  75. [76]

    Nikitha Rao, Jason Tsay, Kiran Kate, Vincent Hellendoorn, and Martin Hirzel

  76. [77]

    In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24)

    AI for Low-Code for AI. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 837–852. https://doi.org/10.1145/ 3640543.3645203

  77. [78]

    Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D

    Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D. Weisz. 2023. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machiner...

  78. [79]

    Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In32nd USENIX Security Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA ...

  79. [80]

    Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn

    Advait Sarkar, Andrew D. Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with artificial intelligence? arXiv:2208.06213 [cs.HC]

  80. [81]

    Jaromir Savelka, Arav Agarwal, Christopher Bogart, Yifan Song, and Majd Sakr

Showing first 80 references.