Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

Deborah Etsenake; Meiyappan Nagappan

arxiv: 2410.01026 · v2 · submitted 2024-10-01 · 💻 cs.SE · cs.HC

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

Deborah Etsenake , Meiyappan Nagappan This is my paper

Pith reviewed 2026-05-23 19:56 UTC · model grok-4.3

classification 💻 cs.SE cs.HC

keywords LLMprogramming tasksuser studieshuman-LLM interactioncode generationsurveynon-determinism

0 comments

The pith

Survey of LLM programming studies finds high variability from non-determinism in humans and models

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews user studies on how people interact with LLMs during programming tasks. It looks at the kinds of requests made, how users complete tasks with the models, and the benefits and weaknesses that result. The review also identifies factors related to the human, the LLM, or their combination that influence personal improvement and task success. The key observation is that interactions differ greatly because both humans and LLMs behave in non-deterministic ways. This variability calls for more detailed investigation into the patterns of these interactions, along with some practical advice for users and researchers.

Core claim

Drawing from user studies, the survey identifies variability in human-LLM interactions in programming tasks stemming from the non-deterministic nature of both humans and LLMs, which highlights the need for a deeper understanding of these interaction patterns and leads to practical suggestions for researchers and programmers.

What carries the argument

Analysis of user interaction behaviors with LLMs, including request types, task completion strategies, benefits, weaknesses, and factors affecting human enhancement and task performance.

If this is right

LLMs offer capabilities for code generation but with mixed impacts on task performance.
Factors from human, LLM, or interaction affect enhancement and performance.
Deeper understanding of interaction patterns is needed due to variability.
Practical suggestions can guide researchers and programmers in using LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tool designers might create interfaces that help stabilize interactions despite non-determinism.
Programmers could benefit from training on effective prompting strategies.
This suggests value in future studies comparing different LLM versions or human expertise levels.

Load-bearing premise

The user studies examined provide a representative sample sufficient to identify the common types of requests, strategies, benefits, weaknesses, and influencing factors.

What would settle it

A large new user study that demonstrates highly consistent human-LLM interaction patterns in programming tasks would challenge the highlighted variability.

Figures

Figures reproduced from arXiv: 2410.01026 by Deborah Etsenake, Meiyappan Nagappan.

**Figure 1.** Figure 1: Human enhancement themes categorized by the number of papers reporting positive, neutral, and negative effects. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: The LLM response Evaluation metric results as examined in the papers and grouped them into number of papers [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

Large Language Models (LLMs) are transforming programming practices, offering significant capabilities for code generation activities. While researchers have explored the potential of LLMs in various domains, this paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks. We first examined the user interaction behaviors with LLMs observed in these studies, from the types of requests made to task completion strategies. Additionally, our analysis reveals both benefits and weaknesses of LLMs showing mixed effects on the human and task. Lastly, we looked into what factors from the human, LLM or the interaction of both, affect the human's enhancement as well as the task performance. Our findings highlight the variability in human-LLM interactions due to the non-deterministic nature of both parties (humans and LLMs), underscoring the need for a deeper understanding of these interaction patterns. We conclude by providing some practical suggestions for researchers as well as programmers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey collects user studies on LLM-assisted programming and flags interaction variability from non-determinism, but its own selection and synthesis process is not described.

read the letter

The main thing to know is that this paper is a literature survey summarizing user studies on how people use LLMs for programming tasks. It covers request types, completion strategies, benefits and weaknesses, and factors that shape outcomes, then concludes that variability comes from the non-deterministic nature of both humans and models and calls for deeper work. It closes with practical suggestions for researchers and programmers. That is the scope and the central message. It organizes existing findings into those categories in a straightforward way, which could save someone new to the area some time when scanning the literature. The suggestions at the end are concrete enough to be usable. The soft spot is exactly the one the stress-test note flags. The abstract supplies no search strategy, inclusion criteria, quality assessment, or handling of conflicting results, so the claim that the examined studies are representative enough to reveal the patterns rests on an unverified assumption. Without those details it is hard to judge whether the variability conclusion is well-supported or whether important studies were missed. If the full paper contains a clear methods section with a reproducible protocol and a table of included work, that would fix the gap; based on what is visible it remains a limitation for any survey. This paper is for software engineering or HCI researchers who want a quick map of current user-study evidence on LLM coding tools. A reader looking for background or starting points could extract value from the categorization and the suggestions. It deserves a serious referee because the topic is active and the synthesis, once the methods are documented, would be a usable reference even without new data or frameworks. I would recommend sending it for review with a request to add the missing methodological details.

Referee Report

1 major / 1 minor

Summary. This paper is a literature survey examining user studies on LLM use in programming tasks. It reviews observed interaction behaviors (request types and task completion strategies), identifies benefits and weaknesses with mixed effects on humans and tasks, analyzes factors influencing human enhancement and performance, highlights variability due to non-determinism in both humans and LLMs, and offers practical suggestions for researchers and programmers.

Significance. If the underlying study selection and synthesis prove rigorous, the survey could usefully consolidate findings on human-LLM dynamics in programming, drawing attention to interaction variability and the need for further research while providing actionable suggestions.

major comments (1)

[Methodology] Methodology section: No details are provided on the literature search strategy (databases, keywords, time frame), inclusion/exclusion criteria, number of papers screened versus included, quality assessment, or handling of contradictory results. This is load-bearing for the central claim that the examined user studies reveal representative patterns of variability due to non-determinism, as the abstract and synthesis rest on the assumption that these studies are sufficient and unbiased.

minor comments (1)

[Abstract] Abstract: Adding a brief statement on the number of studies reviewed and the review protocol would improve transparency without lengthening the abstract substantially.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights an important area for strengthening the paper. We agree that the methodology requires additional transparency to support the survey's claims. Our point-by-point response follows.

read point-by-point responses

Referee: [Methodology] Methodology section: No details are provided on the literature search strategy (databases, keywords, time frame), inclusion/exclusion criteria, number of papers screened versus included, quality assessment, or handling of contradictory results. This is load-bearing for the central claim that the examined user studies reveal representative patterns of variability due to non-determinism, as the abstract and synthesis rest on the assumption that these studies are sufficient and unbiased.

Authors: We agree that the current manuscript does not provide sufficient methodological detail. In the revised version we will add a dedicated Methodology subsection that explicitly describes: the databases and repositories searched (ACM Digital Library, IEEE Xplore, arXiv, Google Scholar), the keyword strings and Boolean queries employed, the time frame (primarily 2022–2024), the inclusion/exclusion criteria (empirical user studies on LLM-assisted programming tasks, English-language, peer-reviewed or preprints with human-subject data), a PRISMA flow diagram reporting screened, eligible, and included papers, any quality or risk-of-bias assessment applied, and the approach taken to synthesize and reconcile contradictory findings. These additions will directly address the concern about representativeness and strengthen the evidential basis for the reported patterns of variability. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive survey with no derivations or self-referential claims

full rationale

This is a literature survey paper that synthesizes findings from external user studies on LLM use in programming. It contains no equations, predictions, fitted parameters, uniqueness theorems, or ansatzes. The central claim about interaction variability is an interpretive summary of cited studies rather than a derivation that reduces to its own inputs by construction. No self-citation load-bearing steps exist, and the paper does not rename known results or smuggle ansatzes. The derivation chain is absent, making circularity analysis inapplicable; the paper is self-contained as a descriptive review.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey paper with no new mathematical models, parameters, axioms, or invented entities. It relies on the body of existing user studies in the field.

pith-pipeline@v0.9.0 · 5697 in / 1123 out tokens · 26056 ms · 2026-05-23T19:56:44.171636+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media
cs.CL 2026-05 unverdicted novelty 5.0

Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

IEEE Standard for a Software Quality Metrics Methodology

1998. IEEE Standard for a Software Quality Metrics Methodology. https: //standards.ieee.org/standard/1061-1998.html

work page 1998
[2]

IEEE Standard for Software Quality Assurance Processes

2014. IEEE Standard for Software Quality Assurance Processes. https:// standards.ieee.org/standard/730-2014.html

work page 2014
[3]

Mathieu Acher, José Galindo Duarte, and Jean-Marc Jézéquel. 2023. On Program- ming Variability with Large Language Model-based Assistant. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 8–14. https://doi.org/10.1145/35790...

work page doi:10.1145/3579027.3608972 2023
[4]

Mathieu Acher and Jabier Martinez. 2023. Generative AI for Reengineering Variants into Software Product Lines: An Experience Report. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 57–66. https://doi.org/10.1145/3579028.3609016

work page doi:10.1145/3579028.3609016 2023
[5]

Santiago Aillon, Alejandro Garcia, Nicolas Velandia, Daniel Zarate, and Pedro Wightman. 2023. Empirical evaluation of automated code generation for mobile Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA applications by AI tools. In 2023 IEEE Colombian Caribbean Conferen...

work page doi:10.1109/c358072.2023.10436306 2023
[6]

Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages. https://doi.org/10.1145/355134...

work page doi:10.1145/3551349.3560438 2023
[7]

Glassman

Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, and Elena L. Glassman. 2024. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 304, 18 pa...

work page doi:10.1145/3613904.3642016 2024
[8]

Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, and Dhruv Kumar. 2024. Analyzing LLM Usage in an Advanced Computing Class in India. arXiv:2404.04603 [cs.HC] https://arxiv. org/abs/2404.04603

work page arXiv 2024
[9]

Owura Asare, Meiyappan Nagappan, and N. Asokan. 2024. A User-centered Security Evaluation of Copilot. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 158, 11 pages. https: //doi.org/10.1145/3597503.3639154

work page doi:10.1145/3597503.3639154 2024
[10]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10. 1145/3586030

work page 2023
[11]

Like a Nesting Doll

Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Little- field, Sami Sarsa, and Stephen MacNeil. 2024. "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students using Large Language Models. arXiv:2403.09409 [cs.HC] https://arxiv.org/abs/2403.09409

work page arXiv 2024
[12]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 20, 6 (jan 2023), 35–57. https://doi.org/10.1145/3582083

work page doi:10.1145/3582083 2023
[13]

Courtni Byun, Piper Vasicek, and Kevin Seppi. 2023. Dispensing with Humans in Human-Computer Interaction Research. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 413, 26 pages. https://doi.org/10.1145/3544549.3582749

work page doi:10.1145/3544549.3582749 2023
[14]

Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment. arXiv:2402.05636 [cs.SE] https://arxiv.org/abs/2402.05636

work page arXiv 2024
[15]

Bei Chen, Daoguang Zan, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2023. Large Language Models Meet NL2Code: A Survey. InACL 2023. https://www.microsoft.com/en-us/research/publication/ large-language-models-meet-nl2code-a-survey/

work page 2023
[16]

Bhavya Chopra, Yasharth Bajpai, Param Biyani, Gustavo Soares, Arjun Rad- hakrishna, Chris Parnin, and Sumit Gulwani. 2024. Exploring Interaction Pat- terns for Debugging: Enhancing Conversational Capabilities of AI-assistants. arXiv:2402.06229 [cs.HC] https://arxiv.org/abs/2402.06229

work page arXiv 2024
[17]

Bhavya Chopra, Ananya Singha, Anna Fariha, Sumit Gulwani, Chris Parnin, Ashish Tiwari, and Austin Z. Henley. 2023. Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities. arXiv:2310.16164 [cs.HC] https://arxiv.org/abs/2310.16164

work page arXiv 2023
[18]

Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, and Anita Sarma. 2024. How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering. In Proceedings of the IEEE/ACM 46th Interna- tional Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Asso- ciation for Computing Machinery, New York, NY, USA, Arti...

work page doi:10.1145/3597503.3639201 2024
[19]

Bruno Pereira Cipriano and Pedro Alves. 2023. GPT-3 vs Object Oriented Programming Assignments: An Experience Report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (, Turku, Finland,) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 61–67. https://doi.org/10.1145/3587102.3588814

work page doi:10.1145/3587102.3588814 2023
[20]

Computer Emergency Response Team. [n. d.]. CERT Secure Coding Standards. https://www.securecoding.cert.org/

work page
[21]

Javier Cámara, Javier Troya, Luis Burgueño, et al. 2023. On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML. Software and Systems Modeling 22, 3 (2023), 781–793. https://doi.org/10.1007/ s10270-023-01105-5

work page 2023
[22]

Smith IV au2, Max Fowler, James Prather, Brett A

Paul Denny, David H. Smith IV au2, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining Code with a Purpose: An In- tegrated Approach for Developing Code Comprehension and Prompting Skills. arXiv:2403.06050 [cs.HC] https://arxiv.org/abs/2403.06050

work page arXiv 2024
[23]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, ...

work page doi:10.1145/3626252.3630909 2024
[24]

Amir Dirin and Teemu Laine. 2024. Examining the Utilization of Artificial Intelligence Tools by Students in Software Engineering Projects. In CSEDU24. https://doi.org/10.5220/0012729400003693

work page doi:10.5220/0012729400003693 2024
[25]

Dreyfus and Hubert Dreyfus

S.E. Dreyfus and Hubert Dreyfus. 1980. A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. , 22 pages. https://apps.dtic.mil/sti/citations/ADA084551#:~:text=In%20acquiring%20a% 20skill%20by,%2C%20proficiency%2C%20expertise%20and%20mastery

work page 1980
[26]

Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayan- swamy, Joseph Breda, Xin Liu, Shwetak Patel, and Vikram Iyer. 2023. Exploring and Characterizing Large Language Models For Embedded System Development and Debugging. arXiv:2307.03817 [cs.SE]

work page arXiv 2023
[27]

Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, and Samer Al-Kiswany. 2023. Measuring the Runtime Performance of Code Produced with GitHub Copilot. arXiv:2305.06439 [cs.SE] https://arxiv.org/abs/2305.06439

work page arXiv 2023
[28]

Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shuvendu K. Lahiri. 2024. LLM-based Test-driven Interactive Code Gen- eration: User Study and Empirical Evaluation. arXiv:2404.10100 [cs.SE] https://arxiv.org/abs/2404.10100

work page arXiv 2024
[29]

Felicia Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2023. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv:2310.09235 [cs.HC]

work page arXiv 2023
[30]

Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Auto- mated Android Bug Replay with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 67, 13 pages. https://doi.org/10.1145/3597503.3608137

work page doi:10.1145/3597503.3608137 2024
[31]

James, Nadia Polikar- pova, and Sorin Lerner

Kasra Ferdowsi, Ruanqianqian Huang, Michael B. James, Nadia Polikar- pova, and Sorin Lerner. 2023. Live Exploration of AI-Generated Programs. arXiv:2306.09541 [cs.HC]

work page arXiv 2023
[32]

Becker, Andrew Luxton-Reilly, and James Prather

James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3511861.3511863 2022
[34]

Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study. In 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) . 319–321. https://doi. org/10.1145/3510454.3522684

work page doi:10.1145/3510454.3522684 2022
[35]

Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages. https://doi.org/10.1145/352...

work page doi:10.1145/3526113.3545659 2022
[36]

Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,)(CHI ’22). Associatio...

work page doi:10.1145/3491102.3501870 2022
[37]

Yong Jing, Hao Wang, Xinyu Chen, et al . 2024. What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi- experimental study. Humanities and Social Sciences Communications 11, 1 (2024),

work page 2024
[38]

https://doi.org/10.1057/s41599-024-02751-w

work page doi:10.1057/s41599-024-02751-w
[39]

Johnson, William Doss, and Christopher M

Daniel M. Johnson, William Doss, and Christopher M. Estepp. 2024. Using ChatGPT with Novice Arduino Programmers: Effects on Performance, Interest, Self-Efficacy, and Programming Ability. Journal of Research in Technical Careers 8, 1 (2024). https://doi.org/10.9741/2578-2118.1152

work page doi:10.9741/2578-2118.1152 2024
[40]

Breanna Jury, Angela Lorusso, Juho Leinonen, Paul Denny, and Andrew Luxton- Reilly. 2024. Evaluating LLM-generated Worked Examples in an Introductory Programming Course. In Proceedings of the 26th Australasian Computing Educa- tion Conference (Sydney, NSW, Australia)(ACE ’24). Association for Computing Machinery, New York, NY, USA, 77–86. https://doi.org/...

work page doi:10.1145/3636243.3636252 2024
[41]

Ulas Berk Karli, Juo-Tung Chen, Victor Nikhil Antony, and Chien-Ming Huang

work page
[42]

In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24)

Alchemist: LLM-Aided End-User Development of Robot Applications. In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24). Association for Computing Machinery, New York, NY, USA, 361–370. https://doi.org/10.1145/3610977.3634969 Conference’17, July 2017, Washington, DC, USA Etsenake and Nagappan

work page doi:10.1145/3610977.3634969 2024
[43]

Ericson, David Weintrop, and Tovi Grossman

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery,...

work page doi:10.1145/3544548.3580919 2023
[44]

Ericson, David Weintrop, and Tovi Grossman

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. How Novices Use LLM-Based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. arXiv:2309.14049 [cs.HC]

work page arXiv 2023
[45]

Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Zachary Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Asso...

work page doi:10.1145/3613904.3642773 2024
[46]

Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco Gomes de Oliveira Neto. 2024. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice. arXiv:2404.14901 [cs.SE] https://arxiv.org/abs/2404.14901

work page arXiv 2024
[47]

Nam Wook Kim, Hyung-Kwon Ko, Grace Myers, and Benjamin Bach

work page
[48]

arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

ChatGPT in Data Visualization Education: A Student Perspective. arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

work page arXiv
[49]

Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931

work page doi:10.1145/3491102.3501931 2022
[50]

Tomaž Kosar, Dragana Ostojić, Yu David Liu, and Marjan Mernik. 2024. Com- puter Science Education in ChatGPT Era: Experiences from an Experiment in a Programming Course for Novice Programmers. Mathematics 12, 5 (2024). https://doi.org/10.3390/math12050629

work page doi:10.3390/math12050629 2024
[51]

Kimio Kuramitsu, Yui Obara, Miyu Sato, and Momoka Obara. 2023. KOGI: A Seamless Integration of ChatGPT into Jupyter Environments for Programming Education. In Proceedings of the 2023 ACM SIGPLAN International Symposium on SPLASH-E (Cascais, Portugal) (SPLASH-E 2023). Association for Computing Machinery, New York, NY, USA, 50–59. https://doi.org/10.1145/36...

work page doi:10.1145/3622780.3623648 2023
[52]

Mark Liffiton, Brad E Sheese, Jaromir Savelka, and Paul Denny. 2024. Code- Help: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. In Proceedings of the 23rd Koli Calling International Con- ference on Computing Education Research (Koli, Finland) (Koli Calling ’23). As- sociation for Computing Machinery, New York, NY,...

work page doi:10.1145/3631802.3631830 2024
[53]

Jinrun Liu, Xinyu Tang, Linlin Li, Panpan Chen, and Yepang Liu. 2023. Which is a better programming assistant? A comparative study between chatgpt and stack overflow. arXiv:2308.13851 [cs.SE] https://arxiv.org/abs/2308.13851

work page arXiv 2023
[54]

Jiaqi Liu, Fengming Zhang, Xin Zhang, Zhiwen Yu, Liang Wang, Yao Zhang, and Bin Guo. 2024. hmCodeTrans: Human–Machine Interactive Code Translation. IEEE Transactions on Software Engineering 50, 5 (2024), 1163–1181. https: //doi.org/10.1109/TSE.2024.3379583

work page doi:10.1109/tse.2024.3379583 2024
[55]

What It Wants Me To Say

Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Con- ference on Human Factors in Computing Systems (, Hamburg, Germany,) ...

work page doi:10.1145/3544548.3580817 2023
[56]

Qianou Ma, Hua Shen, Kenneth Koedinger, and Tongshuang Wu. 2023. HypoCompass: Large-Language-Model-based Tutor for Hypothesis Construc- tion in Debugging for Novices. arXiv:2310.05292 [cs.HC]

work page arXiv 2023
[57]

Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (, Toronto ON, Canada,) (SIGCSE 2023)...

work page doi:10.1145/3545945.3569785 2023
[58]

Desmarais, and Zhen Ming (Jack) Jiang

Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software 203 (2023), 111734. https://doi.org/10.1016/j.jss.2023.111734

work page doi:10.1016/j.jss.2023.111734 2023
[59]

Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2023. Read- ing Between the Lines: Modeling User Behavior and Costs in AI-Assisted Pro- gramming. arXiv:2210.14306 [cs.SE]

work page arXiv 2023
[60]

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 97, 13 pages. https://doi.org/10.1145/359750...

work page doi:10.1145/3597503.3639187 2024
[61]

Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, and Ingo Weber. 2024. LLMs for Science: Usage for Code Generation and Data Analysis. arXiv:2311.16733 [cs.SE] https://arxiv.org/abs/2311.16733

work page arXiv 2024
[62]

Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. 2024. How Beginning Programmers and Code LLMs (Mis)read Each Other. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 651, 26 pages. ...

work page doi:10.1145/3613904.3642706 2024
[63]

Sanghak Oh, Kiho Lee, Seonhye Park, Doowon Kim, and Hyoungshick Kim

work page
[64]

arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

Poisoned ChatGPT Finds Work for Idle Hands: Exploring Develop- ers’ Coding Practices with Insecure Suggestions from Poisoned AI Models. arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

work page arXiv
[65]

Abdessalam Ouaazki, Kristoffer Bergram, and Adrian Holzer. 2023. Leverag- ing ChatGPT to Enhance Computational Thinking Learning Experiences. In 2023 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE). 1–7. https://doi.org/10.1109/TALE56641.2023.10398358

work page doi:10.1109/tale56641.2023.10398358 2023
[66]

Eng Lieh Ouh, Benjamin Kok Siew Gan, Kyong Jin Shim, and Swavek Wlod- kowski. 2023. ChatGPT, Can You Generate Solutions for My Coding Ex- ercises? An Evaluation on Its Effectiveness in an Undergraduate Java Pro- gramming Course.. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE ...

work page doi:10.1145/3587102.3588794 2023
[67]

Omer Said Ozturk, Emre Ekmekcioglu, Orcun Cetin, Budi Arief, and Julio Hernandez-Castro. 2023. New Tricks to Old Codes: Can AI Chatbots Re- place Static Code Analysis Tools?. In Proceedings of the 2023 European In- terdisciplinary Cybersecurity Conference (Stavanger, Norway) (EICC ’23). As- sociation for Computing Machinery, New York, NY, USA, 13–18. http...

work page doi:10.1145/3590777.3590780 2023
[68]

Patton, David Y

Evan W. Patton, David Y. J. Kim, Ashley Granquist, Robin Liu, Arianna Scott, Jennet Zamanova, and Harold Abelson. 2024. Aptly: Making Mobile Apps from Natural Language. arXiv:2405.00229 [cs.HC] https://arxiv.org/abs/2405.00229

work page arXiv 2024
[69]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[70]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. https://doi.org/10.1145/3576915.3623157

work page doi:10.1145/3576915.3623157 2023
[71]

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi

work page
[72]

In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023)

Generating Programs Trivially: Student Use of Large Language Models. In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023). Association for Computing Machinery, New York, NY, USA, 126–132. https://doi.org/10.1145/3576882.3617921

work page doi:10.1145/3576882.3617921 2023
[73]

It’s Weird That It Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (aug 2023). https://doi.org/10.1145/3617367 Just Accepted

work page doi:10.1145/3617367 2023
[74]

Kevin Pu, Jim Yang, Angel Yuan, Minyi Ma, Rui Dong, Xinyu Wang, Yan Chen, and Tovi Grossman. 2023. DiLogics: Creating Web Automation Programs with Diverse Logics. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (, San Francisco, CA, USA,) (UIST ’23). As- sociation for Computing Machinery, New York, NY, USA, Articl...

work page doi:10.1145/3586183.3606822 2023
[75]

Crystal Qian and James Wexler. 2024. Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 370–384. https://doi.org/10.1145/3640543.3645198

work page doi:10.1145/3640543.3645198 2024
[76]

Nikitha Rao, Jason Tsay, Kiran Kate, Vincent Hellendoorn, and Martin Hirzel

work page
[77]

In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24)

AI for Low-Code for AI. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 837–852. https://doi.org/10.1145/ 3640543.3645203

work page arXiv
[78]

Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D

Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D. Weisz. 2023. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machiner...

work page doi:10.1145/3581641.3584037 2023
[79]

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In32nd USENIX Security Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA ...

work page 2023
[80]

Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn

Advait Sarkar, Andrew D. Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with artificial intelligence? arXiv:2208.06213 [cs.HC]

work page arXiv 2022
[81]

Jaromir Savelka, Arav Agarwal, Christopher Bogart, Yifan Song, and Majd Sakr

work page

Showing first 80 references.

[1] [1]

IEEE Standard for a Software Quality Metrics Methodology

1998. IEEE Standard for a Software Quality Metrics Methodology. https: //standards.ieee.org/standard/1061-1998.html

work page 1998

[2] [2]

IEEE Standard for Software Quality Assurance Processes

2014. IEEE Standard for Software Quality Assurance Processes. https:// standards.ieee.org/standard/730-2014.html

work page 2014

[3] [3]

Mathieu Acher, José Galindo Duarte, and Jean-Marc Jézéquel. 2023. On Program- ming Variability with Large Language Model-based Assistant. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 8–14. https://doi.org/10.1145/35790...

work page doi:10.1145/3579027.3608972 2023

[4] [4]

Mathieu Acher and Jabier Martinez. 2023. Generative AI for Reengineering Variants into Software Product Lines: An Experience Report. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B (Tokyo, Japan) (SPLC ’23). Association for Computing Machinery, New York, NY, USA, 57–66. https://doi.org/10.1145/3579028.3609016

work page doi:10.1145/3579028.3609016 2023

[5] [5]

Santiago Aillon, Alejandro Garcia, Nicolas Velandia, Daniel Zarate, and Pedro Wightman. 2023. Empirical evaluation of automated code generation for mobile Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA applications by AI tools. In 2023 IEEE Colombian Caribbean Conferen...

work page doi:10.1109/c358072.2023.10436306 2023

[6] [6]

Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages. https://doi.org/10.1145/355134...

work page doi:10.1145/3551349.3560438 2023

[7] [7]

Glassman

Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, and Elena L. Glassman. 2024. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 304, 18 pa...

work page doi:10.1145/3613904.3642016 2024

[8] [8]

Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, and Dhruv Kumar. 2024. Analyzing LLM Usage in an Advanced Computing Class in India. arXiv:2404.04603 [cs.HC] https://arxiv. org/abs/2404.04603

work page arXiv 2024

[9] [9]

Owura Asare, Meiyappan Nagappan, and N. Asokan. 2024. A User-centered Security Evaluation of Copilot. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 158, 11 pages. https: //doi.org/10.1145/3597503.3639154

work page doi:10.1145/3597503.3639154 2024

[10] [10]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10. 1145/3586030

work page 2023

[11] [11]

Like a Nesting Doll

Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Little- field, Sami Sarsa, and Stephen MacNeil. 2024. "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students using Large Language Models. arXiv:2403.09409 [cs.HC] https://arxiv.org/abs/2403.09409

work page arXiv 2024

[12] [12]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 20, 6 (jan 2023), 35–57. https://doi.org/10.1145/3582083

work page doi:10.1145/3582083 2023

[13] [13]

Courtni Byun, Piper Vasicek, and Kevin Seppi. 2023. Dispensing with Humans in Human-Computer Interaction Research. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 413, 26 pages. https://doi.org/10.1145/3544549.3582749

work page doi:10.1145/3544549.3582749 2023

[14] [14]

Sayan Chatterjee, Ching Louis Liu, Gareth Rowland, and Tim Hogarth. 2024. The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment. arXiv:2402.05636 [cs.SE] https://arxiv.org/abs/2402.05636

work page arXiv 2024

[15] [15]

Bei Chen, Daoguang Zan, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2023. Large Language Models Meet NL2Code: A Survey. InACL 2023. https://www.microsoft.com/en-us/research/publication/ large-language-models-meet-nl2code-a-survey/

work page 2023

[16] [16]

Bhavya Chopra, Yasharth Bajpai, Param Biyani, Gustavo Soares, Arjun Rad- hakrishna, Chris Parnin, and Sumit Gulwani. 2024. Exploring Interaction Pat- terns for Debugging: Enhancing Conversational Capabilities of AI-assistants. arXiv:2402.06229 [cs.HC] https://arxiv.org/abs/2402.06229

work page arXiv 2024

[17] [17]

Bhavya Chopra, Ananya Singha, Anna Fariha, Sumit Gulwani, Chris Parnin, Ashish Tiwari, and Austin Z. Henley. 2023. Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities. arXiv:2310.16164 [cs.HC] https://arxiv.org/abs/2310.16164

work page arXiv 2023

[18] [18]

Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, and Anita Sarma. 2024. How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering. In Proceedings of the IEEE/ACM 46th Interna- tional Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Asso- ciation for Computing Machinery, New York, NY, USA, Arti...

work page doi:10.1145/3597503.3639201 2024

[19] [19]

Bruno Pereira Cipriano and Pedro Alves. 2023. GPT-3 vs Object Oriented Programming Assignments: An Experience Report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (, Turku, Finland,) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 61–67. https://doi.org/10.1145/3587102.3588814

work page doi:10.1145/3587102.3588814 2023

[20] [20]

Computer Emergency Response Team. [n. d.]. CERT Secure Coding Standards. https://www.securecoding.cert.org/

work page

[21] [21]

Javier Cámara, Javier Troya, Luis Burgueño, et al. 2023. On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML. Software and Systems Modeling 22, 3 (2023), 781–793. https://doi.org/10.1007/ s10270-023-01105-5

work page 2023

[22] [22]

Smith IV au2, Max Fowler, James Prather, Brett A

Paul Denny, David H. Smith IV au2, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining Code with a Purpose: An In- tegrated Approach for Developing Code Comprehension and Prompting Skills. arXiv:2403.06050 [cs.HC] https://arxiv.org/abs/2403.06050

work page arXiv 2024

[23] [23]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, ...

work page doi:10.1145/3626252.3630909 2024

[24] [24]

Amir Dirin and Teemu Laine. 2024. Examining the Utilization of Artificial Intelligence Tools by Students in Software Engineering Projects. In CSEDU24. https://doi.org/10.5220/0012729400003693

work page doi:10.5220/0012729400003693 2024

[25] [25]

Dreyfus and Hubert Dreyfus

S.E. Dreyfus and Hubert Dreyfus. 1980. A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. , 22 pages. https://apps.dtic.mil/sti/citations/ADA084551#:~:text=In%20acquiring%20a% 20skill%20by,%2C%20proficiency%2C%20expertise%20and%20mastery

work page 1980

[26] [26]

Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayan- swamy, Joseph Breda, Xin Liu, Shwetak Patel, and Vikram Iyer. 2023. Exploring and Characterizing Large Language Models For Embedded System Development and Debugging. arXiv:2307.03817 [cs.SE]

work page arXiv 2023

[27] [27]

Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, and Samer Al-Kiswany. 2023. Measuring the Runtime Performance of Code Produced with GitHub Copilot. arXiv:2305.06439 [cs.SE] https://arxiv.org/abs/2305.06439

work page arXiv 2023

[28] [28]

Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shuvendu K. Lahiri. 2024. LLM-based Test-driven Interactive Code Gen- eration: User Study and Empirical Evaluation. arXiv:2404.10100 [cs.SE] https://arxiv.org/abs/2404.10100

work page arXiv 2024

[29] [29]

Felicia Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2023. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv:2310.09235 [cs.HC]

work page arXiv 2023

[30] [30]

Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Auto- mated Android Bug Replay with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 67, 13 pages. https://doi.org/10.1145/3597503.3608137

work page doi:10.1145/3597503.3608137 2024

[31] [31]

James, Nadia Polikar- pova, and Sorin Lerner

Kasra Ferdowsi, Ruanqianqian Huang, Michael B. James, Nadia Polikar- pova, and Sorin Lerner. 2023. Live Exploration of AI-Generated Programs. arXiv:2306.09541 [cs.HC]

work page arXiv 2023

[32] [32]

Becker, Andrew Luxton-Reilly, and James Prather

James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3511861.3511863 2022

[33] [34]

Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study. In 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) . 319–321. https://doi. org/10.1145/3510454.3522684

work page doi:10.1145/3510454.3522684 2022

[34] [35]

Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages. https://doi.org/10.1145/352...

work page doi:10.1145/3526113.3545659 2022

[35] [36]

Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,)(CHI ’22). Associatio...

work page doi:10.1145/3491102.3501870 2022

[36] [37]

Yong Jing, Hao Wang, Xinyu Chen, et al . 2024. What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi- experimental study. Humanities and Social Sciences Communications 11, 1 (2024),

work page 2024

[37] [38]

https://doi.org/10.1057/s41599-024-02751-w

work page doi:10.1057/s41599-024-02751-w

[38] [39]

Johnson, William Doss, and Christopher M

Daniel M. Johnson, William Doss, and Christopher M. Estepp. 2024. Using ChatGPT with Novice Arduino Programmers: Effects on Performance, Interest, Self-Efficacy, and Programming Ability. Journal of Research in Technical Careers 8, 1 (2024). https://doi.org/10.9741/2578-2118.1152

work page doi:10.9741/2578-2118.1152 2024

[39] [40]

Breanna Jury, Angela Lorusso, Juho Leinonen, Paul Denny, and Andrew Luxton- Reilly. 2024. Evaluating LLM-generated Worked Examples in an Introductory Programming Course. In Proceedings of the 26th Australasian Computing Educa- tion Conference (Sydney, NSW, Australia)(ACE ’24). Association for Computing Machinery, New York, NY, USA, 77–86. https://doi.org/...

work page doi:10.1145/3636243.3636252 2024

[40] [41]

Ulas Berk Karli, Juo-Tung Chen, Victor Nikhil Antony, and Chien-Ming Huang

work page

[41] [42]

In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24)

Alchemist: LLM-Aided End-User Development of Robot Applications. In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’24). Association for Computing Machinery, New York, NY, USA, 361–370. https://doi.org/10.1145/3610977.3634969 Conference’17, July 2017, Washington, DC, USA Etsenake and Nagappan

work page doi:10.1145/3610977.3634969 2024

[42] [43]

Ericson, David Weintrop, and Tovi Grossman

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery,...

work page doi:10.1145/3544548.3580919 2023

[43] [44]

Ericson, David Weintrop, and Tovi Grossman

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. How Novices Use LLM-Based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. arXiv:2309.14049 [cs.HC]

work page arXiv 2023

[44] [45]

Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Zachary Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Asso...

work page doi:10.1145/3613904.3642773 2024

[45] [46]

Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco Gomes de Oliveira Neto. 2024. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice. arXiv:2404.14901 [cs.SE] https://arxiv.org/abs/2404.14901

work page arXiv 2024

[46] [47]

Nam Wook Kim, Hyung-Kwon Ko, Grace Myers, and Benjamin Bach

work page

[47] [48]

arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

ChatGPT in Data Visualization Education: A Student Perspective. arXiv:2405.00748 [cs.HC] https://arxiv.org/abs/2405.00748

work page arXiv

[48] [49]

Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931

work page doi:10.1145/3491102.3501931 2022

[49] [50]

Tomaž Kosar, Dragana Ostojić, Yu David Liu, and Marjan Mernik. 2024. Com- puter Science Education in ChatGPT Era: Experiences from an Experiment in a Programming Course for Novice Programmers. Mathematics 12, 5 (2024). https://doi.org/10.3390/math12050629

work page doi:10.3390/math12050629 2024

[50] [51]

Kimio Kuramitsu, Yui Obara, Miyu Sato, and Momoka Obara. 2023. KOGI: A Seamless Integration of ChatGPT into Jupyter Environments for Programming Education. In Proceedings of the 2023 ACM SIGPLAN International Symposium on SPLASH-E (Cascais, Portugal) (SPLASH-E 2023). Association for Computing Machinery, New York, NY, USA, 50–59. https://doi.org/10.1145/36...

work page doi:10.1145/3622780.3623648 2023

[51] [52]

Mark Liffiton, Brad E Sheese, Jaromir Savelka, and Paul Denny. 2024. Code- Help: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. In Proceedings of the 23rd Koli Calling International Con- ference on Computing Education Research (Koli, Finland) (Koli Calling ’23). As- sociation for Computing Machinery, New York, NY,...

work page doi:10.1145/3631802.3631830 2024

[52] [53]

Jinrun Liu, Xinyu Tang, Linlin Li, Panpan Chen, and Yepang Liu. 2023. Which is a better programming assistant? A comparative study between chatgpt and stack overflow. arXiv:2308.13851 [cs.SE] https://arxiv.org/abs/2308.13851

work page arXiv 2023

[53] [54]

Jiaqi Liu, Fengming Zhang, Xin Zhang, Zhiwen Yu, Liang Wang, Yao Zhang, and Bin Guo. 2024. hmCodeTrans: Human–Machine Interactive Code Translation. IEEE Transactions on Software Engineering 50, 5 (2024), 1163–1181. https: //doi.org/10.1109/TSE.2024.3379583

work page doi:10.1109/tse.2024.3379583 2024

[54] [55]

What It Wants Me To Say

Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Con- ference on Human Factors in Computing Systems (, Hamburg, Germany,) ...

work page doi:10.1145/3544548.3580817 2023

[55] [56]

Qianou Ma, Hua Shen, Kenneth Koedinger, and Tongshuang Wu. 2023. HypoCompass: Large-Language-Model-based Tutor for Hypothesis Construc- tion in Debugging for Novices. arXiv:2310.05292 [cs.HC]

work page arXiv 2023

[56] [57]

Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (, Toronto ON, Canada,) (SIGCSE 2023)...

work page doi:10.1145/3545945.3569785 2023

[57] [58]

Desmarais, and Zhen Ming (Jack) Jiang

Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability? Journal of Systems and Software 203 (2023), 111734. https://doi.org/10.1016/j.jss.2023.111734

work page doi:10.1016/j.jss.2023.111734 2023

[58] [59]

Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2023. Read- ing Between the Lines: Modeling User Behavior and Costs in AI-Assisted Pro- gramming. arXiv:2210.14306 [cs.SE]

work page arXiv 2023

[59] [60]

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 97, 13 pages. https://doi.org/10.1145/359750...

work page doi:10.1145/3597503.3639187 2024

[60] [61]

Mohamed Nejjar, Luca Zacharias, Fabian Stiehle, and Ingo Weber. 2024. LLMs for Science: Usage for Code Generation and Data Analysis. arXiv:2311.16733 [cs.SE] https://arxiv.org/abs/2311.16733

work page arXiv 2024

[61] [62]

Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. 2024. How Beginning Programmers and Code LLMs (Mis)read Each Other. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 651, 26 pages. ...

work page doi:10.1145/3613904.3642706 2024

[62] [63]

Sanghak Oh, Kiho Lee, Seonhye Park, Doowon Kim, and Hyoungshick Kim

work page

[63] [64]

arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

Poisoned ChatGPT Finds Work for Idle Hands: Exploring Develop- ers’ Coding Practices with Insecure Suggestions from Poisoned AI Models. arXiv:2312.06227 [cs.CR] https://arxiv.org/abs/2312.06227

work page arXiv

[64] [65]

Abdessalam Ouaazki, Kristoffer Bergram, and Adrian Holzer. 2023. Leverag- ing ChatGPT to Enhance Computational Thinking Learning Experiences. In 2023 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE). 1–7. https://doi.org/10.1109/TALE56641.2023.10398358

work page doi:10.1109/tale56641.2023.10398358 2023

[65] [66]

Eng Lieh Ouh, Benjamin Kok Siew Gan, Kyong Jin Shim, and Swavek Wlod- kowski. 2023. ChatGPT, Can You Generate Solutions for My Coding Ex- ercises? An Evaluation on Its Effectiveness in an Undergraduate Java Pro- gramming Course.. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE ...

work page doi:10.1145/3587102.3588794 2023

[66] [67]

Omer Said Ozturk, Emre Ekmekcioglu, Orcun Cetin, Budi Arief, and Julio Hernandez-Castro. 2023. New Tricks to Old Codes: Can AI Chatbots Re- place Static Code Analysis Tools?. In Proceedings of the 2023 European In- terdisciplinary Cybersecurity Conference (Stavanger, Norway) (EICC ’23). As- sociation for Computing Machinery, New York, NY, USA, 13–18. http...

work page doi:10.1145/3590777.3590780 2023

[67] [68]

Patton, David Y

Evan W. Patton, David Y. J. Kim, Ashley Granquist, Robin Liu, Arianna Scott, Jennet Zamanova, and Harold Abelson. 2024. Aptly: Making Mobile Apps from Natural Language. arXiv:2405.00229 [cs.HC] https://arxiv.org/abs/2405.00229

work page arXiv 2024

[68] [69]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[69] [70]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. https://doi.org/10.1145/3576915.3623157

work page doi:10.1145/3576915.3623157 2023

[70] [71]

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi

work page

[71] [72]

In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023)

Generating Programs Trivially: Student Use of Large Language Models. In Proceedings of the ACM Conference on Global Computing Education Vol 1 (, Hyderabad, India,) (CompEd 2023). Association for Computing Machinery, New York, NY, USA, 126–132. https://doi.org/10.1145/3576882.3617921

work page doi:10.1145/3576882.3617921 2023

[72] [73]

It’s Weird That It Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (aug 2023). https://doi.org/10.1145/3617367 Just Accepted

work page doi:10.1145/3617367 2023

[73] [74]

Kevin Pu, Jim Yang, Angel Yuan, Minyi Ma, Rui Dong, Xinyu Wang, Yan Chen, and Tovi Grossman. 2023. DiLogics: Creating Web Automation Programs with Diverse Logics. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (, San Francisco, CA, USA,) (UIST ’23). As- sociation for Computing Machinery, New York, NY, USA, Articl...

work page doi:10.1145/3586183.3606822 2023

[74] [75]

Crystal Qian and James Wexler. 2024. Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 370–384. https://doi.org/10.1145/3640543.3645198

work page doi:10.1145/3640543.3645198 2024

[75] [76]

Nikitha Rao, Jason Tsay, Kiran Kate, Vincent Hellendoorn, and Martin Hirzel

work page

[76] [77]

In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24)

AI for Low-Code for AI. In Proceedings of the 29th International Conference on Intelligent User Interfaces (Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New York, NY, USA, 837–852. https://doi.org/10.1145/ 3640543.3645203

work page arXiv

[77] [78]

Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D

Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D. Weisz. 2023. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machiner...

work page doi:10.1145/3581641.3584037 2023

[78] [79]

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In32nd USENIX Security Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks Conference’17, July 2017, Washington, DC, USA ...

work page 2023

[79] [80]

Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn

Advait Sarkar, Andrew D. Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with artificial intelligence? arXiv:2208.06213 [cs.HC]

work page arXiv 2022

[80] [81]

Jaromir Savelka, Arav Agarwal, Christopher Bogart, Yifan Song, and Majd Sakr

work page