Teaching Prompt-Based Programming with LLMs: A 45-Minute Lesson with Guided Practice for End-User Programmers

Keith Tran; Samiha Marwan; Thomas Price

arxiv: 2606.30547 · v1 · pith:YEO56TBXnew · submitted 2026-06-29 · 💻 cs.CY

Teaching Prompt-Based Programming with LLMs: A 45-Minute Lesson with Guided Practice for End-User Programmers

Keith Tran , Samiha Marwan , Thomas Price This is my paper

Pith reviewed 2026-06-30 03:23 UTC · model grok-4.3

classification 💻 cs.CY

keywords prompt-based programminglarge language modelscomputer science educationend-user programmingprompting skillsself-efficacyrandomized controlled study

0 comments

The pith

A 45-minute lesson on prompt-based programming produces modest gains in students' ability to specify computational goals to LLMs and larger gains in self-efficacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates whether a short lesson can teach end-user programmers to communicate computational goals clearly to LLMs through natural language prompts. In a randomized controlled study with 55 engineering students, the 45-minute lesson with guided practice led to average pre-to-post gains of 10.8 percentage points on a goal-specification test compared to 1.1 points for a code-tracing control activity of equal length. The experimental group also showed significantly larger increases in prompting self-efficacy. This approach matters because prompt-based programming lowers entry barriers for non-CS learners yet still requires skill in avoiding vague prompts, and the study checks whether such skill can be taught efficiently within existing course schedules. The results indicate that brief instruction can produce measurable improvement, though the modest test-score effect points to the need for more extensive practice.

Core claim

Our results suggest it is likely that a brief intervention can improve learners' ability to specify computational goals to LLMs. However, the effect was modest, suggesting that prompting skills may require more time and practice to develop. We provide a lightweight lesson that requires no prior CS background and can be readily dropped into existing courses.

What carries the argument

The 45-minute prompt-based programming lesson with guided practice, evaluated through randomized comparison against a code-tracing control on pre-post changes in goal-specification performance and self-efficacy.

Load-bearing premise

The pre- and post-tests accurately measure the ability to specify computational goals to LLMs, and the control condition represents a class without prompt-focused instruction.

What would settle it

A larger replication study finding no difference in pre-to-post gains on the goal-specification test between the prompting lesson and code-tracing groups would indicate the intervention does not improve the targeted skill.

Figures

Figures reproduced from arXiv: 2606.30547 by Keith Tran, Samiha Marwan, Thomas Price.

read the original abstract

Prompt-based programming, a new modality enabled by large language models (LLMs), allows users to express computational goals through natural language rather than traditional code. While this approach lowers barriers to entry, especially for non-CS learners, it does not eliminate the need for foundational CS skills. Learners often struggle to communicate their intent clearly to LLMs, resulting in vague or underspecified prompts. Prior work has documented the need for explicit prompting for both CS and non-CS learners. However, it remains less clear how such instruction can fit into busy classrooms or how much time is needed to produce meaningful gains. In this paper, we evaluated a 45-minute prompt-based programming intervention, consisting of a lesson with guided practice, against a business-as-usual CS lab activity (code tracing) of equal length, representing a class without prompt-focused instruction. We conducted a randomized controlled study with 55 engineering students. We found that students in the experimental condition improved more on average (though not significantly more) from pre- to post-test than the control group (+10.8 vs +1.1 percentage points) and showed significantly greater average gains in prompting self-efficacy (+35.4 vs +21.9 percentage points). Our results suggest it is likely that a brief intervention can improve learners' ability to specify computational goals to LLMs. However, the effect was modest, suggesting that prompting skills may require more time and practice to develop. We provide a lightweight lesson that requires no prior CS background and can be readily dropped into existing courses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A small RCT finds a 45-minute prompting lesson boosts self-efficacy more than code tracing but the performance gain is non-significant and the tests lack validation details.

read the letter

The core takeaway is that this 45-minute intervention produced a statistically significant self-efficacy gain (+35.4 vs +21.9 pp) but only a non-significant performance bump (+10.8 vs +1.1 pp) on the pre/post tests. The study is a straightforward randomized trial with 55 engineering students comparing the lesson against a code-tracing control.

What is new is the concrete, time-bounded lesson design plus the original RCT data. Prior work had flagged the need for prompting instruction; this paper tests whether a single short session can move the needle in a real classroom slot. The authors also supply the lesson materials so others can try it.

The paper does a clean job of reporting both performance and self-efficacy outcomes and of keeping the intervention lightweight. It is honest about the modest size of the effects and the fact that prompting skills probably need more than 45 minutes.

The soft spots are the performance measure itself. The difference is not significant, and the abstract gives no item-level data, rubric, or external check that the test actually captures ability to specify goals to LLMs. The control condition is reasonable but the validity concern the stress-test note raises is real: without that check, the non-significant result could just mean the test is noisy. Sample size is modest, so power is limited.

This paper is for CS educators who want a ready-to-use module on prompting. A reader who teaches intro courses or works on LLM literacy will get practical value from the lesson plan even if the results stay tentative.

It deserves peer review. The design is simple and the data are original; referees can push on the test validity and suggest a larger follow-up without the paper being desk-rejected.

Referee Report

2 major / 1 minor

Summary. The paper reports results from a randomized controlled study (n=55 engineering students) comparing a 45-minute prompt-based programming lesson with guided practice against an equal-length code-tracing control activity. The experimental group showed larger average pre-to-post gains on a test of specifying computational goals to LLMs (+10.8 vs +1.1 percentage points, not statistically significant) and significantly larger gains in prompting self-efficacy (+35.4 vs +21.9 percentage points). The authors conclude that a brief intervention can produce modest improvements in learners' ability to specify goals to LLMs but that more time and practice may be needed.

Significance. If the central empirical result holds after addressing measurement concerns, the work provides a practical, lightweight lesson plan that can be inserted into existing non-CS courses and supplies RCT evidence on the time investment required for prompting skill development. The equal-duration control condition and direct measurement of both performance and self-efficacy are strengths that allow clear comparison to business-as-usual instruction.

major comments (2)

[Abstract / Results] Abstract and Results: The primary claim that the intervention improves ability to specify computational goals rests on the +10.8 vs +1.1 pp performance difference, yet this difference is reported as non-significant. With n=55 the study is under-powered for detecting modest effects, so the ability conclusion should be qualified or supported by additional analyses (e.g., effect-size reporting, per-item breakdowns, or power calculations).
[Methods] Methods: No item-level description of the pre/post tests, scoring rubric, inter-rater reliability, or external validation (e.g., correlation with actual LLM prompt quality) is provided. Without these details it is impossible to confirm that the tests validly measure the target construct of goal-specification ability rather than general test-taking or domain knowledge.

minor comments (1)

[Abstract] The abstract states the performance gain is 'not significantly more' while the self-efficacy gain is 'significantly greater'; these qualifiers should be repeated consistently in the discussion to avoid over-statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of results and methods.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The primary claim that the intervention improves ability to specify computational goals rests on the +10.8 vs +1.1 pp performance difference, yet this difference is reported as non-significant. With n=55 the study is under-powered for detecting modest effects, so the ability conclusion should be qualified or supported by additional analyses (e.g., effect-size reporting, per-item breakdowns, or power calculations).

Authors: We agree that the performance gain difference is non-significant and that the study is likely underpowered to detect modest effects. The manuscript already qualifies the conclusion by stating the difference is 'not significantly more' and describing the effect as 'modest,' with the explicit suggestion that more time and practice may be needed. To further address the concern, we will add effect-size reporting (Cohen's d for the performance and self-efficacy measures), a post-hoc power analysis, and a brief per-item breakdown of the test in the Results section. These additions will support the qualified claims without overstating the findings. revision: yes
Referee: [Methods] Methods: No item-level description of the pre/post tests, scoring rubric, inter-rater reliability, or external validation (e.g., correlation with actual LLM prompt quality) is provided. Without these details it is impossible to confirm that the tests validly measure the target construct of goal-specification ability rather than general test-taking or domain knowledge.

Authors: We acknowledge the absence of these methodological details in the current version. We will expand the Methods section to include sample pre/post test items, the full scoring rubric, inter-rater reliability statistics (two independent raters scored all responses), and a description of how the test was designed to isolate goal-specification ability. We will also note the lack of direct external validation against LLM-generated prompt quality as a limitation and discuss the test's alignment with the target construct. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical RCT with measured outcomes

full rationale

The paper is a randomized controlled study reporting pre/post score changes from an educational intervention. No equations, derivations, fitted parameters, or self-citation chains exist that reduce any result to its own inputs by construction. Claims rest on direct measurement of test scores and self-efficacy, which are independent of any prior fitted quantities within the paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the pre/post measures as proxies for prompting ability and the assumption that randomization produced comparable groups with differences attributable to the lesson.

axioms (2)

domain assumption The pre- and post-tests validly measure learners' ability to specify computational goals to LLMs
The study interprets score changes as evidence of improved prompting skill.
domain assumption Random assignment to conditions produced balanced groups
Standard assumption required to attribute outcome differences to the intervention.

pith-pipeline@v0.9.1-grok · 5814 in / 1418 out tokens · 67016 ms · 2026-06-30T03:23:28.175057+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Thom Baguley. 2009. Standardized or simple effect size: What should be reported? British journal of psychology100, 3 (2009), 603–617

2009
[2]

Giang Bui, Naaz Sibia, Angela Zavaleta Bernuy, Michael Liut, and Andrew Pe- tersen. 2023. Prior Programming Experience: A Persistent Performance Gap in CS1 and CS2. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Com- puting Machinery, New York, NY, USA, 889–895. doi:10...

work page doi:10.1145/3545945.3569752 2023
[3]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2023. Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators. arXiv:2307.16364 [cs.HC] https://arxiv.org/abs/2307.16364

work page arXiv 2023
[4]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1(Portland, OR, USA)(SIGCSE 2024). Association for Computing Machinery, New...

work page doi:10.1145/3626252.3630909 2024
[5]

Rodrigo Silva Duran, Jan-Mikael Rybicki, Arto Hellas, and Sanna Suoranta. 2019. Towards a Common Instrument for Measuring Prior Programming Knowledge. InProceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education(Aberdeen, Scotland Uk)(ITiCSE ’19). Association for Computing Machinery, New York, NY, USA, 443–449. doi:1...

work page doi:10.1145/3304221 2019
[6]

Janet Feigenspan, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanen- berg. 2012. Measuring programming experience. In2012 20th IEEE International Conference on Program Comprehension (ICPC). 73–82. doi:10.1109/ICPC.2012. 6240511

work page doi:10.1109/icpc.2012 2012
[7]

Molly Q Feldman and Carolyn Jane Anderson. 2024. Non-Expert Programmers in the Generative AI Future. InProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work(Newcastle upon Tyne, United Kingdom)(CHIWORK ’24). Association for Computing Machinery, New York, NY, USA, Article 15, 19 pages. doi:10.1145/3663384.3663393

work page doi:10.1145/3663384.3663393 2024
[8]

Philip J. Guo. 2013. Online python tutor: embeddable web-based program visual- ization for cs education. InProceeding of the 44th ACM Technical Symposium on Computer Science Education(Denver, Colorado, USA)(SIGCSE ’13). Association for Computing Machinery, New York, NY, USA, 579–584. doi:10.1145/2445196. 2445368

work page doi:10.1145/2445196 2013
[9]

Jinyoung Hur and Kathryn Cunningham. 2024. Profiling Conversational Pro- grammers at University: Insights into their Motivations and Goals from a Broad Sample of Non-Majors. InProceedings of the 2024 ACM Conference on Interna- tional Computing Education Research - Volume 1(Melbourne, VIC, Australia) (ICER ’24). Association for Computing Machinery, New Yor...

work page doi:10.1145/3632620.3671123 2024
[10]

Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. PromptMaker: Prompt-based Prototyping with Large Language Models. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Ar...

work page doi:10.1145/3491101.3503564 2022
[11]

Yu-Jeng Ju, Yi-Ching Wang, Shih-Chieh Lee, Cheng-Heng Liu, Jen-Hsuan Liu, Chih-Wei Yang, and Ching-Lin Hsieh. 2025. Developing the questionnaire of self-efficacy and needs in using large-language model-based AI services.Current Psychology(2025), 1–19

2025
[12]

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara Jane Ericson, David Weintrop, and Tovi Grossman. 2024. How Novices Use LLM-based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Ed- ucation Research(Koli, Finland)(Koli Calling ’23). Asso...

work page doi:10.1145/3631802.3631806 2024
[13]

Chris Kerslake, Paul Denny, IV Smith, David H., James Prather, Juho Leinonen, Andrew Luxton-Reilly, and Stephen MacNeil. 2024. Integrating Natural Language Prompting Tasks in Introductory Programming Courses. InProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 1(Virtual Event, NC, USA)(SIGCSE Virtual 2024). Association for Co...

work page doi:10.1145/3649165.3690125 2024
[14]

Amy J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering.ACM Comput. Surv.43, 3, Article 21 (April 2011), 44 pages. doi:10.1145/192264...

work page doi:10.1145/1922649.1922658 2011
[15]

Liang, Melissa Lin, Nikitha Rao, and Brad A

Jenny T. Liang, Melissa Lin, Nikitha Rao, and Brad A. Myers. 2025. Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts.Proc. ACM Softw. Eng.2, FSE, Article FSE072 (June 2025), 24 pages. doi:10.1145/3729342

work page doi:10.1145/3729342 2025
[16]

Leo S Lo. 2023. The CLEAR path: A framework for enhancing information literacy through prompt engineering.The Journal of Academic Librarianship49, 4 (2023), 102720

2023
[17]

Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson. 2025. Substance Beats Style: Why Beginning Students Fail to Code with LLMs. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 8541–8610

2025
[18]

Qianou Ma, Weirui Peng, Chenyang Yang, Hua Shen, Kenneth Koedinger, and Tongshuang Wu. 2025. What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use.ACM Trans. Comput.-Hum. Interact.(April 2025). doi:10.1145/3731756 Just Accepted

work page doi:10.1145/3731756 2025
[20]

Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, Sam Lau, Stephen MacNeil, Narges Norouzi, Simone Opel, Vee Pettit, Leo Porter, Brent N. Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro
[21]

In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024)

Beyond the Hype: A Comprehensive Review of Current Trends in Genera- tive AI Research, Teaching Practices, and Tools. In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA, 300–338. doi:10.1145/3689187.3709614

work page doi:10.1145/3689187.3709614 2024
[22]

Vennila Ramalingam, Deborah LaBelle, and Susan Wiedenbeck. 2004. Self-efficacy and mental models in learning to program. InProceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education(Leeds, United Kingdom)(ITiCSE ’04). Association for Computing Machinery, New York, NY, USA, 171–175. doi:10.1145/1007996.1008042

work page doi:10.1145/1007996.1008042 2004
[23]

Advait Sarkar. 2023. Will Code Remain a Relevant User Interface for End-User Pro- gramming with Generative AI Models?. InProceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Pro- gramming and Software(Cascais, Portugal)(Onward! 2023). Association for Com- puting Machinery, New York, NY, USA, 153–167. ...

work page doi:10.1145/3622758.3622882 2023
[24]

Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Articl...

work page arXiv 2024
[25]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA,...

work page arXiv 2024
[26]

Wang, Ryan Mitts, Philip J

April Y. Wang, Ryan Mitts, Philip J. Guo, and Parmit K. Chilana. 2018. Mismatch of Expectations: How Modern Learning Resources Fail Conversational Programmers. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3173574.3174085

work page doi:10.1145/3173574.3174085 2018
[27]

Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA statement on p-values: context, process, and purpose. 129–133 pages

2016
[28]

Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. 2019. Moving to a world beyond “p< 0.05”. 19 pages

2019
[29]

2005.Understanding by design

Grant P Wiggins and Jay McTighe. 2005.Understanding by design. Ascd

2005
[30]

Bodo Winter. 2013. Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499 [cs.CL] https://arxiv.org/abs/1308.5499

work page internal anchor Pith review Pith/arXiv arXiv 2013
[31]

Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J Cai. 2022. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New Y...

work page doi:10.1145/3491101.3519729 2022
[32]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang
[33]

InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23)

Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. doi:10.1145/3544548. 3581388

work page doi:10.1145/3544548 2023

[1] [1]

Thom Baguley. 2009. Standardized or simple effect size: What should be reported? British journal of psychology100, 3 (2009), 603–617

2009

[2] [2]

Giang Bui, Naaz Sibia, Angela Zavaleta Bernuy, Michael Liut, and Andrew Pe- tersen. 2023. Prior Programming Experience: A Persistent Performance Gap in CS1 and CS2. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Association for Com- puting Machinery, New York, NY, USA, 889–895. doi:10...

work page doi:10.1145/3545945.3569752 2023

[3] [3]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2023. Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators. arXiv:2307.16364 [cs.HC] https://arxiv.org/abs/2307.16364

work page arXiv 2023

[4] [4]

Becker, and Brent N

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. InProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1(Portland, OR, USA)(SIGCSE 2024). Association for Computing Machinery, New...

work page doi:10.1145/3626252.3630909 2024

[5] [5]

Rodrigo Silva Duran, Jan-Mikael Rybicki, Arto Hellas, and Sanna Suoranta. 2019. Towards a Common Instrument for Measuring Prior Programming Knowledge. InProceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education(Aberdeen, Scotland Uk)(ITiCSE ’19). Association for Computing Machinery, New York, NY, USA, 443–449. doi:1...

work page doi:10.1145/3304221 2019

[6] [6]

Janet Feigenspan, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanen- berg. 2012. Measuring programming experience. In2012 20th IEEE International Conference on Program Comprehension (ICPC). 73–82. doi:10.1109/ICPC.2012. 6240511

work page doi:10.1109/icpc.2012 2012

[7] [7]

Molly Q Feldman and Carolyn Jane Anderson. 2024. Non-Expert Programmers in the Generative AI Future. InProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work(Newcastle upon Tyne, United Kingdom)(CHIWORK ’24). Association for Computing Machinery, New York, NY, USA, Article 15, 19 pages. doi:10.1145/3663384.3663393

work page doi:10.1145/3663384.3663393 2024

[8] [8]

Philip J. Guo. 2013. Online python tutor: embeddable web-based program visual- ization for cs education. InProceeding of the 44th ACM Technical Symposium on Computer Science Education(Denver, Colorado, USA)(SIGCSE ’13). Association for Computing Machinery, New York, NY, USA, 579–584. doi:10.1145/2445196. 2445368

work page doi:10.1145/2445196 2013

[9] [9]

Jinyoung Hur and Kathryn Cunningham. 2024. Profiling Conversational Pro- grammers at University: Insights into their Motivations and Goals from a Broad Sample of Non-Majors. InProceedings of the 2024 ACM Conference on Interna- tional Computing Education Research - Volume 1(Melbourne, VIC, Australia) (ICER ’24). Association for Computing Machinery, New Yor...

work page doi:10.1145/3632620.3671123 2024

[10] [10]

Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. PromptMaker: Prompt-based Prototyping with Large Language Models. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Ar...

work page doi:10.1145/3491101.3503564 2022

[11] [11]

Yu-Jeng Ju, Yi-Ching Wang, Shih-Chieh Lee, Cheng-Heng Liu, Jen-Hsuan Liu, Chih-Wei Yang, and Ching-Lin Hsieh. 2025. Developing the questionnaire of self-efficacy and needs in using large-language model-based AI services.Current Psychology(2025), 1–19

2025

[12] [12]

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara Jane Ericson, David Weintrop, and Tovi Grossman. 2024. How Novices Use LLM-based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Ed- ucation Research(Koli, Finland)(Koli Calling ’23). Asso...

work page doi:10.1145/3631802.3631806 2024

[13] [13]

Chris Kerslake, Paul Denny, IV Smith, David H., James Prather, Juho Leinonen, Andrew Luxton-Reilly, and Stephen MacNeil. 2024. Integrating Natural Language Prompting Tasks in Introductory Programming Courses. InProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 1(Virtual Event, NC, USA)(SIGCSE Virtual 2024). Association for Co...

work page doi:10.1145/3649165.3690125 2024

[14] [14]

Amy J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering.ACM Comput. Surv.43, 3, Article 21 (April 2011), 44 pages. doi:10.1145/192264...

work page doi:10.1145/1922649.1922658 2011

[15] [15]

Liang, Melissa Lin, Nikitha Rao, and Brad A

Jenny T. Liang, Melissa Lin, Nikitha Rao, and Brad A. Myers. 2025. Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts.Proc. ACM Softw. Eng.2, FSE, Article FSE072 (June 2025), 24 pages. doi:10.1145/3729342

work page doi:10.1145/3729342 2025

[16] [16]

Leo S Lo. 2023. The CLEAR path: A framework for enhancing information literacy through prompt engineering.The Journal of Academic Librarianship49, 4 (2023), 102720

2023

[17] [17]

Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson. 2025. Substance Beats Style: Why Beginning Students Fail to Code with LLMs. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 8541–8610

2025

[18] [18]

Qianou Ma, Weirui Peng, Chenyang Yang, Hua Shen, Kenneth Koedinger, and Tongshuang Wu. 2025. What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use.ACM Trans. Comput.-Hum. Interact.(April 2025). doi:10.1145/3731756 Just Accepted

work page doi:10.1145/3731756 2025

[19] [20]

Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, Sam Lau, Stephen MacNeil, Narges Norouzi, Simone Opel, Vee Pettit, Leo Porter, Brent N. Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro

[20] [21]

In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024)

Beyond the Hype: A Comprehensive Review of Current Trends in Genera- tive AI Research, Teaching Practices, and Tools. In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA, 300–338. doi:10.1145/3689187.3709614

work page doi:10.1145/3689187.3709614 2024

[21] [22]

Vennila Ramalingam, Deborah LaBelle, and Susan Wiedenbeck. 2004. Self-efficacy and mental models in learning to program. InProceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education(Leeds, United Kingdom)(ITiCSE ’04). Association for Computing Machinery, New York, NY, USA, 171–175. doi:10.1145/1007996.1008042

work page doi:10.1145/1007996.1008042 2004

[22] [23]

Advait Sarkar. 2023. Will Code Remain a Relevant User Interface for End-User Pro- gramming with Generative AI Models?. InProceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Pro- gramming and Software(Cascais, Portugal)(Onward! 2023). Association for Com- puting Machinery, New York, NY, USA, 153–167. ...

work page doi:10.1145/3622758.3622882 2023

[23] [24]

Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Articl...

work page arXiv 2024

[24] [25]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA,...

work page arXiv 2024

[25] [26]

Wang, Ryan Mitts, Philip J

April Y. Wang, Ryan Mitts, Philip J. Guo, and Parmit K. Chilana. 2018. Mismatch of Expectations: How Modern Learning Resources Fail Conversational Programmers. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3173574.3174085

work page doi:10.1145/3173574.3174085 2018

[26] [27]

Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA statement on p-values: context, process, and purpose. 129–133 pages

2016

[27] [28]

Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. 2019. Moving to a world beyond “p< 0.05”. 19 pages

2019

[28] [29]

2005.Understanding by design

Grant P Wiggins and Jay McTighe. 2005.Understanding by design. Ascd

2005

[29] [30]

Bodo Winter. 2013. Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499 [cs.CL] https://arxiv.org/abs/1308.5499

work page internal anchor Pith review Pith/arXiv arXiv 2013

[30] [31]

Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J Cai. 2022. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New Y...

work page doi:10.1145/3491101.3519729 2022

[31] [32]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang

[32] [33]

InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23)

Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. doi:10.1145/3544548. 3581388

work page doi:10.1145/3544548 2023