Combating Harms of Generative AI in CS1 with Code Review Interviews and a Flipped Classroom

Erik Falor; John Edwards; Peter Fowles; Seth Poulsen; Sulove Bhattarai

arxiv: 2605.21374 · v1 · pith:CYHQFLNSnew · submitted 2026-05-20 · 💻 cs.HC

Combating Harms of Generative AI in CS1 with Code Review Interviews and a Flipped Classroom

Peter Fowles , Erik Falor , Sulove Bhattarai , John Edwards , Seth Poulsen This is my paper

Pith reviewed 2026-05-21 03:29 UTC · model grok-4.3

classification 💻 cs.HC

keywords generative AI in educationCS1 pedagogyoral code reviewsflipped classroomLLM usagestudent learning outcomeskeystroke loggingformative assessment

0 comments

The pith

Oral code reviews in a flipped CS1 class preserve student understanding even as LLM usage for assignments rises sharply.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to address the problem of students using large language models to complete programming assignments without learning the underlying concepts. It proposes weekly oral code review interviews where students explain their code, paired with a flipped classroom that frees up time for these reviews. Data from three semesters shows that exam scores stayed about the same or slightly higher in the new format, while keystroke logs indicate much more code was pasted from AI sources. Surveys found students generally liked the code reviews, with complaints mainly about scheduling. This suggests the approach lets students use AI tools freely while still building real understanding.

Core claim

Through oral code review assessments and a flipped classroom in CS1, students maintain adequate performance on written exams despite a dramatic increase in the use of generative AI for completing coding assignments. Analysis of exam scores across semesters shows no statistically significant decline, keystroke data confirms higher AI involvement through increased pasting, and end-of-semester surveys indicate positive attitudes toward the interviews.

What carries the argument

Weekly oral code review interviews, in which students must explain and justify the code they submitted for assignments, supported by a flipped classroom model that allocates class time for learning concepts independently and scheduling flexibility for interviews.

If this is right

Students retain conceptual understanding as measured by exam performance even when relying more on AI for code production.
Formative oral assessments can incentivize metacognitive engagement with code regardless of its source.
Positive student feedback supports the feasibility of scaling this model with improvements in scheduling and training.
The combination allows experimentation with AI tools without apparent harm to foundational learning in introductory programming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar oral review methods might transfer to other introductory STEM courses facing AI tools.
Institutions could shift from detection and punishment of AI use toward redesigned assessments that reward explanation.
Repeated oral reviews may build stronger habits of code comprehension beyond what written exams alone achieve.
Data on long-term retention or transfer to new problems would further test the approach.

Load-bearing premise

Oral code review interviews accurately measure and promote deep conceptual understanding instead of students simply preparing rote explanations for the interview.

What would settle it

A follow-up assessment where students must solve new programming problems or explain code modifications without prior preparation would show significantly lower performance if the reviews only encourage superficial knowledge.

Figures

Figures reproduced from arXiv: 2605.21374 by Erik Falor, John Edwards, Peter Fowles, Seth Poulsen, Sulove Bhattarai.

**Figure 1.** Figure 1: Distribution of exam scores across the Fall 2023, Fall 2024, and CS1-CR semesters. Results on both exams were relatively stable [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

**Figure 2.** Figure 2: Difference in paste events in coding assignments between the Fall 2024 and CS1-CR semesters. (a) Distribution of the total [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Distribution of number characters pasted in each submission. (b) Distribution of percentage of characters pasted per total [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Results from multiple choice survey questions distributed to CS1-CR students at the end of the CS1-CR semester. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Results from Likert scale survey questions distributed to CS1-CR students at the end of the CS1-CR semester. These results [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Background and Context: Large Language Models (LLMs) are more accessible and accurate than ever before, raising significant concerns for computing educators. One major concern is students using LLMs to bypass the effort needed to understand concepts and metacognitive strategies essential for success in computer science. Objectives: We contribute a unique approach to assessing and building up student understanding through weekly oral code review assessments. These formative assessments incentivize students to understand their submitted code, regardless of whether or not the code was generated by AI tools. We also use a flipped classroom to provide time for students to learn concepts outside of class and provide ample time for students to schedule code review interviews. Methods: For this paper, we collected data from three semesters. We analyze student exam scores, keystroke logs, and surveys to understand how the new course policies affected student learning, behavior, and attitudes. Findings: Pairwise comparison of exam results reveals a statistically insignificant increase in average scores for Fall 2025 compared to previous semesters. Keystroke logs show a significant increase in characters pasted per total characters input into coding assignments in Fall 2025, pointing towards higher AI usage. Survey results show positive student sentiment towards code reviews at the end of Fall 2025, with nearly all negative feedback being addressable through better scheduling and more rigorous TA training. Implications: Oral code reviews with a flipped classroom appear to be effective at mitigating harms of LLM use while providing space for students to freely experiment with these tools. Our work suggests that students in Fall 2025 still show adequate understanding of material covered in written exams, despite dramatic increases in LLM usage for coding assignments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Oral code reviews plus flipped classroom in CS1 hold exam scores steady amid more AI pasting, but the data does not clearly tie the interviews to genuine conceptual gains.

read the letter

The paper describes a flipped CS1 setup with weekly oral code review interviews to push students toward understanding their code even when they use LLMs for assignments. They ran this across three semesters, compared exam averages, tracked paste rates in keystroke logs, and collected end-of-term surveys. Scores showed a small insignificant rise in the intervention semester while pasting increased, and students gave mostly positive feedback once scheduling and TA prep were sorted out.

Referee Report

3 major / 2 minor

Summary. The manuscript describes an educational intervention in a CS1 course that combines weekly oral code review interviews with a flipped classroom to mitigate harms from students using LLMs to bypass conceptual understanding and metacognitive strategies. Drawing on data from three semesters, the authors compare exam scores, keystroke logs showing pasted characters, and end-of-semester surveys, reporting a statistically insignificant increase in average exam scores in Fall 2025 alongside a significant rise in pasting (indicating higher LLM usage) and generally positive student feedback on the code reviews.

Significance. If the methodological gaps are addressed, the work could provide a practical, replicable model for CS educators seeking to accommodate AI tools while using formative oral assessments to promote genuine comprehension. The multi-method approach (logs + exams + surveys) offers a useful template for studying behavioral changes in AI-augmented programming courses.

major comments (3)

[Methods] Methods: The description of data collection and analysis lacks sample sizes per semester, the exact statistical tests and p-values for the pairwise exam-score comparisons, effect sizes, and any controls or covariates for semester-to-semester differences in student population, exam difficulty, or course content.
[Findings] Findings and Implications: The claim that stable exam scores demonstrate 'adequate understanding' and mitigation of LLM harms assumes written exams assess the explanatory and debugging skills practiced in oral reviews, yet no example exam items, alignment analysis, or evidence ruling out superficial interview preparation is provided.
[Findings] Findings: The significant increase in pasted characters is presented as evidence of higher AI usage, but without baseline pasting rates from prior semesters or validation that pasting correlates with LLM rather than other copy-paste behaviors, the link to the intervention's effectiveness remains indirect.

minor comments (2)

[Abstract] Abstract: The abstract refers to 'three semesters' and 'Fall 2025' without naming the comparison semesters or clarifying whether the prior data were collected under identical course policies.
[Findings] Survey results: Positive sentiment is reported, but response rate, number of respondents, and breakdown of negative feedback categories should be included to evaluate how representative the views are.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas to strengthen the manuscript. We address each major comment below and have revised the paper accordingly to improve methodological transparency and the presentation of findings. We believe these changes enhance the work without altering its core contributions.

read point-by-point responses

Referee: [Methods] Methods: The description of data collection and analysis lacks sample sizes per semester, the exact statistical tests and p-values for the pairwise exam-score comparisons, effect sizes, and any controls or covariates for semester-to-semester differences in student population, exam difficulty, or course content.

Authors: We agree that these details were insufficiently reported in the original submission. In the revised manuscript, we have added the per-semester sample sizes, specified that pairwise comparisons used independent-samples t-tests, reported exact p-values along with Cohen's d effect sizes, and included a discussion of potential covariates (e.g., student demographics and minor variations in exam content). We also explicitly note the limitations of retrospective data collection in fully controlling for all semester-to-semester differences. These additions directly address the transparency concerns. revision: yes
Referee: [Findings] Findings and Implications: The claim that stable exam scores demonstrate 'adequate understanding' and mitigation of LLM harms assumes written exams assess the explanatory and debugging skills practiced in oral reviews, yet no example exam items, alignment analysis, or evidence ruling out superficial interview preparation is provided.

Authors: This is a fair critique of the linkage between assessment types. While the manuscript presents stable exam scores as evidence of maintained understanding, we acknowledge the absence of explicit examples and formal alignment. In revision we will add representative exam items that require explanation and debugging, include a short description of how oral review skills map to exam performance, and discuss why weekly interviews covering all assignments make superficial preparation unlikely. We maintain that the multi-method data (exams + logs + surveys) supports our interpretation but will strengthen this section with the requested details. revision: partial
Referee: [Findings] Findings: The significant increase in pasted characters is presented as evidence of higher AI usage, but without baseline pasting rates from prior semesters or validation that pasting correlates with LLM rather than other copy-paste behaviors, the link to the intervention's effectiveness remains indirect.

Authors: The manuscript does report a statistically significant increase in pasted characters for Fall 2025 relative to prior semesters, which implies baseline data were analyzed. However, we agree that explicit baseline rates and stronger validation of the pasting metric were not detailed enough. In the revision we will report the actual prior-semester pasting percentages for direct comparison and add discussion of why we interpret elevated pasting as primarily LLM-related (e.g., log timing patterns and assignment context), while acknowledging that other copy-paste sources could contribute. This will make the behavioral evidence more explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons with no derivations or self-referential reductions

full rationale

This is an empirical education research paper that reports data collection from three semesters, including exam score comparisons, keystroke log analysis for paste rates, and end-of-semester surveys. The central findings rest on direct statistical pairwise comparisons and descriptive survey results rather than any mathematical derivation, fitted model, uniqueness theorem, or ansatz that could reduce to its own inputs by construction. No equations, predictions, or load-bearing self-citations appear in the provided text; the argument is self-contained against the collected external benchmarks (prior semesters and student responses).

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work is an empirical education study that relies on standard statistical comparison methods and domain assumptions about what exam scores and interview performance indicate; no new free parameters, invented entities, or ad-hoc axioms are introduced.

axioms (2)

domain assumption Pairwise semester comparisons of exam scores are valid when course content and grading standards remain comparable across terms
Invoked when reporting insignificant score changes between Fall 2025 and prior semesters.
domain assumption Increased ratio of pasted characters to total keystrokes serves as a proxy for higher LLM usage
Used to interpret the significant rise in pasted characters as evidence of greater AI assistance.

pith-pipeline@v0.9.0 · 5848 in / 1408 out tokens · 63728 ms · 2026-05-21T03:29:09.090037+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Elmira Adeeb and Kasia Muldner. 2025. How Do Novice Programmers Solve Code-Tracing Problems When ChatGPT Is Available? A Qualitative Analysis.. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1. 421–434

work page 2025
[2]

Saleh Alhazbi. 2016. Using flipped classroom approach to teach computer programming. 441–444. doi:10.1109/TALE.2016.7851837

work page doi:10.1109/tale.2016.7851837 2016
[3]

Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos

Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Associ...

work page doi:10.1145/3545945.3569759 2023
[4]

Seth Bernstein, Ashfin Rahman, Nadia Sharifi, Ariunjargal Terbish, and Stephen MacNeil. 2025. Beyond the Benefits: A Systematic Review of the Harms and Consequences of Generative AI in Computing Education. InProceedings of the 25th Koli Calling International Conference on Computing Education Research (Koli Calling ’25). Association for Computing Machinery...

work page doi:10.1145/3769994.3770036 2025
[5]

Jérôme Brender, Laila El-Hamamsy, Francesco Mondada, and Engin Bumbacher. 2024. Who’s helping who? when students use chatgpt to engage in practice lab sessions. InInternational Conference on Artificial Intelligence in Education. Springer, 235–249

work page 2024
[6]

Yi-Hsing Chang, An-Ching Song, and Rong-Jyue Fang. 2018. The Study of Programming Language Learning by Applying Flipped Classroom. In 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII). 286–289. doi:10.1109/ICKII.2018.8569171

work page doi:10.1109/ickii.2018.8569171 2018
[7]

Li Cheng, Albert Ritzhaupt, and Pavlo "Pasha Antonenko. 2018. Effects of the flipped classroom instructional strategy on students’ learning outcomes: a meta-analysis.Educational Technology Research and Development67 (10 2018). doi:10.1007/s11423-018-9633-7

work page doi:10.1007/s11423-018-9633-7 2018
[8]

John Edwards. 2025. JetBrains Marketplace; ShowYourWork Plugin. https://plugins.jetbrains.com/plugin/18353-showyourwork

work page 2025
[9]

Hasmik Gharibyan. 2005. Assessing students’ knowledge: oral exams vs. written tests. InProceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education(Caparica, Portugal)(ITiCSE ’05). Association for Computing Machinery, New York, NY, USA, 143–147. doi:10.1145/1067445.1067487

work page doi:10.1145/1067445.1067487 2005
[10]

Aashish Ghimire and John Edwards. 2024. Coding with ai: How are tools like chatgpt being used by students in foundational programming courses. InInternational Conference on Artificial Intelligence in Education. Springer, 259–267

work page 2024
[11]

Aashish Ghimire and John Edwards. 2024. From Guidelines to Governance: A Study of AI Policies in Education. InArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky, Andrew M. Olney, Irene-Angelica Chounta, Zitao Liu, Olga C. Santos, ...

work page 2024
[12]

Dirk Grunwald, Elizabeth Boese, Rhonda Hoenigman, Andy Sayler, and Judith Stafford. 2015. Personalized Attention @ Scale: Talk Isn’t Cheap, But It’s Effective. InProceedings of the 46th ACM Technical Symposium on Computer Science Education(Kansas City, Missouri, USA)(SIGCSE ’15). Manuscript submitted to ACM Combating Harms of Generative AI in CS1 with Cod...

work page doi:10.1145/2676723.2677283 2015
[13]

Kaden Hart, Christopher M Warren, and John Edwards. 2023. Accurate estimation of time-on-task while programming. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 708–714

work page 2023
[14]

Christopher Hundhausen, Anukrati Agrawal, Dana Fairbrother, and Michael Trevisan. 2009. Integrating pedagogical code reviews into a CS 1 course: an empirical study.SIGCSE Bull.41, 1 (March 2009), 291–295. doi:10.1145/1539024.1508972

work page doi:10.1145/1539024.1508972 2009
[15]

Mark Huxham, Fiona Campbell, and Jenny Westwood. 2012. Oral versus written assessments: a test of student performance and attitudes.Assessment & Evaluation in Higher Education37, 1 (2012), 125–136. arXiv:https://doi.org/10.1080/02602938.2010.515012 doi:10.1080/02602938.2010.515012

work page doi:10.1080/02602938.2010.515012 2012
[16]

Iannone and A

P. Iannone and A. Simpson. 2012. Oral assessment in mathematics: implementation and outcomes.Teaching Mathematics and its Applications: An International Journal of the IMA31, 4 (10 2012), 179–190. arXiv:https://academic.oup.com/teamat/article-pdf/31/4/179/4762864/hrs012.pdf doi:10.1093/teamat/hrs012

work page doi:10.1093/teamat/hrs012 2012
[17]

Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A Review of Peer Code Review in Higher Education.ACM Trans. Comput. Educ.20, 3, Article 22 (Sept. 2020), 25 pages. doi:10.1145/3403935

work page doi:10.1145/3403935 2020
[18]

Gregor Jo˘st, Viktor Taneski, and Sa˘so Karakati˘c. 2024. The impact of large language models on programming education and student learning outcomes.Applied Sciences14, 10 (2024), 4115

work page 2024
[19]

Lee, Joyce Malyn-Smith, Beatriz Perret, Vikram Tiwari, Joshua Kenitzer, Andrew Macvean, and Erin Barrar

Matthew Kam, Cody Miller, Miaoxin Wang, Abey Tidwell, Irene A. Lee, Joyce Malyn-Smith, Beatriz Perret, Vikram Tiwari, Joshua Kenitzer, Andrew Macvean, and Erin Barrar. 2025. What do professional software developers need to know to succeed in an age of Artificial Intelligence?. InProceedings of the 33rd ACM International Conference on the Foundations of So...

work page doi:10.1145/3696630.3727251 2025
[20]

Ban It Till We Understand It

Sam Lau and Philip Guo. 2023. From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. InProceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1(Chicago, IL, US...

work page doi:10.1145/3568813.3600138 2023
[21]

Abdallah Mohamed. 2020. Evaluating the Effectiveness of Flipped Teaching in a Mixed-Ability CS1 Course. InProceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education(Trondheim, Norway)(ITiCSE ’20). Association for Computing Machinery, New York, NY, USA, 452–458. doi:10.1145/3341525.3387395

work page doi:10.1145/3341525.3387395 2020
[22]

Peter Ohmann. 2019. An Assessment of Oral Exams in Introductory CS. InProceedings of the 50th ACM Technical Symposium on Computer Science Education(Minneapolis, MN, USA)(SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 613–619. doi:10.1145/3287324.3287489

work page doi:10.1145/3287324.3287489 2019
[23]

Peter Ohmann and Ed Novak. 2025. A Multi-Institutional Assessment of Oral Exams in Software Courses. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1(Pittsburgh, PA, USA)(SIGCSETS 2025). Association for Computing Machinery, New York, NY, USA, 882–888. doi:10.1145/3641554.3701848

work page doi:10.1145/3641554.3701848 2025
[24]

Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, Sam Lau, Stephen MacNeil, Narges Norouzi, Simone Opel, Vee Pettit, Leo Porter, Brent N. Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro. 2025. Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Too...

work page doi:10.1145/3689187.3709614 2025
[25]

It’s weird that it knows what I want

James Prather, Brent N Reeves, Paul Denny, Brett A Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s weird that it knows what I want”: Usability and interactions with copilot for novice programmers.ACM transactions on computer-human interaction31, 1 (2023), 1–31

work page 2023
[26]

James Prather, Brent N Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S Randrianasolo, Brett A Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The widening gap: The benefits and harms of generative AI for novice programmers. InProceedings of the 2024 ACM Conference on International Computing Education Research-Volume 1. 469–486

work page 2024
[27]

Victor Rivera, Hamna Aslam, Alexandr Naumchev, Daniel de Carvalho, Mansur Khazeev, and Manuel Mazzara. 2020. Towards Code Review Guideline in a Classroom. InFrontiers in Software Engineering Education, Jean-Michel Bruel, Alfredo Capozucca, Manuel Mazzara, Bertrand Meyer, Alexandr Naumchev, and Andrey Sadovykh (Eds.). Springer International Publishing, Cha...

work page 2020
[28]

2012.Flip your classroom: Reach every student in every class every day

Aaron Sams and Jonathan Bergmann. 2012.Flip your classroom: Reach every student in every class every day. International Society for Technology in Education/ISTE

work page 2012
[29]

Namita Sarawagi. 2014. A flipped CS0 classroom: applying Bloom’s taxonomy to algorithmic thinking.J. Comput. Sci. Coll.29, 6 (June 2014), 21–28

work page 2014
[30]

Md Istiak Hossain Shihab, Christopher Hundhausen, Ahsun Tariq, Summit Haque, Yunhan Qiao, and Brian Wise Mulanda. 2025. The Effects of GitHub Copilot on Computing Students’ Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1. 407–420

work page 2025
[31]

Jinrui Tian and Ronghua Zhang. 2025. Learners’ AI dependence and critical thinking: The psychological mechanism of fatigue and the social buffering role of AI literacy.Acta Psychologica260 (2025), 105725. doi:10.1016/j.actpsy.2025.105725

work page doi:10.1016/j.actpsy.2025.105725 2025
[32]

Keith Topping. 1998. Peer Assessment Between Students in Colleges and Universities.Review of Educational Research68, 3 (1998), 249–276. arXiv:https://doi.org/10.3102/00346543068003249 doi:10.3102/00346543068003249

work page doi:10.3102/00346543068003249 1998
[33]

Pérez-Quiñones, and Stephen H

Scott Alexander Turner, Manuel A. Pérez-Quiñones, and Stephen H. Edwards. 2018. Peer Review in CS2: Conceptual Learning and High-Level Thinking.ACM Trans. Comput. Educ.18, 3, Article 13 (Sept. 2018), 37 pages. doi:10.1145/3152715 Manuscript submitted to ACM 22 Fowles et al

work page doi:10.1145/3152715 2018
[34]

Muhammad Mahad Umair and Patrick Mukala. 2026. Generative AI-Driven or AI-Assisted Software Code Generation and the Decline of Community Knowledge Sharing: Challenges and Future Prospects. InInformation System Design: AI and ML Applications, Vikrant Bhateja, Soly Mathew Biju, and Siba K. Udgata (Eds.). Springer Nature Singapore, Singapore, 115–125

work page 2026
[35]

Smith IV, Samvrit Srinath, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Daniel Zingaro, and Leo Porter

Annapurna Vadaparty, David H. Smith IV, Samvrit Srinath, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Daniel Zingaro, and Leo Porter

work page
[36]

arXiv:2510.18806 [cs.CY] https://arxiv.org/abs/2510.18806

Integrating Large Language Models and Evaluating Student Outcomes in an Introductory Computer Science Course. arXiv:2510.18806 [cs.CY] https://arxiv.org/abs/2510.18806

work page arXiv
[37]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA,...

work page doi:10.1145/3649217.3653584 2024
[38]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. InChi conference on human factors in computing systems extended abstracts. 1–7

work page 2022
[39]

Yuankai Xue, Hanlin Chen, Gina R Bai, Robert Tairas, and Yu Huang. 2024. Does ChatGPT help with introductory programming? An experiment of students using ChatGPT in CS1. InProceedings of the 46th International conference on software engineering: software engineering education and training. 331–341

work page 2024
[40]

Hatice Yildiz-Durak. 2019. Modeling Different Variables in Learning Basic Concepts of Programming in Flipped Classrooms.Journal of Educational Computing Research58 (03 2019), 073563311982795. doi:10.1177/0735633119827956

work page doi:10.1177/0735633119827956 2019
[41]

Noor Azlinda Zainal Abidin. 2024. The Efficacy of Flipped Classroom Models in Improving Student Engagement and Achievement: A Meta-Analysis. Global Synthesis in Education Journal2 (11 2024), 25–44. doi:10.61667/v180e591

work page doi:10.61667/v180e591 2024
[42]

Cynthia Zastudil, Magdalena Rogalska, Christine Kapp, Jennifer Vaughn, and Stephen MacNeil. 2023. Generative ai in computing education: Perspectives of students and instructors. In2023 IEEE Frontiers in Education Conference (FIE). IEEE, 1–9

work page 2023
[43]

Yitong Zhao. 2018. Impact of Oral Exams on a Thermodynamics Course Performance. In2018 ASEE Zone IV Conference. ASEE Conferences, Boulder, Colorado. https://peer.asee.org/29617

work page 2018
[44]

Huiwen Zou, Ka Ian Chan, Patrick Pang, Blandina Manditereza, and Yi-Huang Shih. 2026. To Use but Not to Depend: Pedagogical Novelty and the Cognitive Brake of Ethical Awareness in Computer Science Students’ Adoption of Generative AI.Education Sciences16, 2 (2026). doi:10.3390/educsci16020311 Received 20 February 2007; revised 12 March 2009; accepted 5 Jun...

work page doi:10.3390/educsci16020311 2026

[1] [1]

Elmira Adeeb and Kasia Muldner. 2025. How Do Novice Programmers Solve Code-Tracing Problems When ChatGPT Is Available? A Qualitative Analysis.. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1. 421–434

work page 2025

[2] [2]

Saleh Alhazbi. 2016. Using flipped classroom approach to teach computer programming. 441–444. doi:10.1109/TALE.2016.7851837

work page doi:10.1109/tale.2016.7851837 2016

[3] [3]

Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos

Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1(Toronto ON, Canada)(SIGCSE 2023). Associ...

work page doi:10.1145/3545945.3569759 2023

[4] [4]

Seth Bernstein, Ashfin Rahman, Nadia Sharifi, Ariunjargal Terbish, and Stephen MacNeil. 2025. Beyond the Benefits: A Systematic Review of the Harms and Consequences of Generative AI in Computing Education. InProceedings of the 25th Koli Calling International Conference on Computing Education Research (Koli Calling ’25). Association for Computing Machinery...

work page doi:10.1145/3769994.3770036 2025

[5] [5]

Jérôme Brender, Laila El-Hamamsy, Francesco Mondada, and Engin Bumbacher. 2024. Who’s helping who? when students use chatgpt to engage in practice lab sessions. InInternational Conference on Artificial Intelligence in Education. Springer, 235–249

work page 2024

[6] [6]

Yi-Hsing Chang, An-Ching Song, and Rong-Jyue Fang. 2018. The Study of Programming Language Learning by Applying Flipped Classroom. In 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII). 286–289. doi:10.1109/ICKII.2018.8569171

work page doi:10.1109/ickii.2018.8569171 2018

[7] [7]

Li Cheng, Albert Ritzhaupt, and Pavlo "Pasha Antonenko. 2018. Effects of the flipped classroom instructional strategy on students’ learning outcomes: a meta-analysis.Educational Technology Research and Development67 (10 2018). doi:10.1007/s11423-018-9633-7

work page doi:10.1007/s11423-018-9633-7 2018

[8] [8]

John Edwards. 2025. JetBrains Marketplace; ShowYourWork Plugin. https://plugins.jetbrains.com/plugin/18353-showyourwork

work page 2025

[9] [9]

Hasmik Gharibyan. 2005. Assessing students’ knowledge: oral exams vs. written tests. InProceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education(Caparica, Portugal)(ITiCSE ’05). Association for Computing Machinery, New York, NY, USA, 143–147. doi:10.1145/1067445.1067487

work page doi:10.1145/1067445.1067487 2005

[10] [10]

Aashish Ghimire and John Edwards. 2024. Coding with ai: How are tools like chatgpt being used by students in foundational programming courses. InInternational Conference on Artificial Intelligence in Education. Springer, 259–267

work page 2024

[11] [11]

Aashish Ghimire and John Edwards. 2024. From Guidelines to Governance: A Study of AI Policies in Education. InArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky, Andrew M. Olney, Irene-Angelica Chounta, Zitao Liu, Olga C. Santos, ...

work page 2024

[12] [12]

Dirk Grunwald, Elizabeth Boese, Rhonda Hoenigman, Andy Sayler, and Judith Stafford. 2015. Personalized Attention @ Scale: Talk Isn’t Cheap, But It’s Effective. InProceedings of the 46th ACM Technical Symposium on Computer Science Education(Kansas City, Missouri, USA)(SIGCSE ’15). Manuscript submitted to ACM Combating Harms of Generative AI in CS1 with Cod...

work page doi:10.1145/2676723.2677283 2015

[13] [13]

Kaden Hart, Christopher M Warren, and John Edwards. 2023. Accurate estimation of time-on-task while programming. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 708–714

work page 2023

[14] [14]

Christopher Hundhausen, Anukrati Agrawal, Dana Fairbrother, and Michael Trevisan. 2009. Integrating pedagogical code reviews into a CS 1 course: an empirical study.SIGCSE Bull.41, 1 (March 2009), 291–295. doi:10.1145/1539024.1508972

work page doi:10.1145/1539024.1508972 2009

[15] [15]

Mark Huxham, Fiona Campbell, and Jenny Westwood. 2012. Oral versus written assessments: a test of student performance and attitudes.Assessment & Evaluation in Higher Education37, 1 (2012), 125–136. arXiv:https://doi.org/10.1080/02602938.2010.515012 doi:10.1080/02602938.2010.515012

work page doi:10.1080/02602938.2010.515012 2012

[16] [16]

Iannone and A

P. Iannone and A. Simpson. 2012. Oral assessment in mathematics: implementation and outcomes.Teaching Mathematics and its Applications: An International Journal of the IMA31, 4 (10 2012), 179–190. arXiv:https://academic.oup.com/teamat/article-pdf/31/4/179/4762864/hrs012.pdf doi:10.1093/teamat/hrs012

work page doi:10.1093/teamat/hrs012 2012

[17] [17]

Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A Review of Peer Code Review in Higher Education.ACM Trans. Comput. Educ.20, 3, Article 22 (Sept. 2020), 25 pages. doi:10.1145/3403935

work page doi:10.1145/3403935 2020

[18] [18]

Gregor Jo˘st, Viktor Taneski, and Sa˘so Karakati˘c. 2024. The impact of large language models on programming education and student learning outcomes.Applied Sciences14, 10 (2024), 4115

work page 2024

[19] [19]

Lee, Joyce Malyn-Smith, Beatriz Perret, Vikram Tiwari, Joshua Kenitzer, Andrew Macvean, and Erin Barrar

Matthew Kam, Cody Miller, Miaoxin Wang, Abey Tidwell, Irene A. Lee, Joyce Malyn-Smith, Beatriz Perret, Vikram Tiwari, Joshua Kenitzer, Andrew Macvean, and Erin Barrar. 2025. What do professional software developers need to know to succeed in an age of Artificial Intelligence?. InProceedings of the 33rd ACM International Conference on the Foundations of So...

work page doi:10.1145/3696630.3727251 2025

[20] [20]

Ban It Till We Understand It

Sam Lau and Philip Guo. 2023. From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. InProceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1(Chicago, IL, US...

work page doi:10.1145/3568813.3600138 2023

[21] [21]

Abdallah Mohamed. 2020. Evaluating the Effectiveness of Flipped Teaching in a Mixed-Ability CS1 Course. InProceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education(Trondheim, Norway)(ITiCSE ’20). Association for Computing Machinery, New York, NY, USA, 452–458. doi:10.1145/3341525.3387395

work page doi:10.1145/3341525.3387395 2020

[22] [22]

Peter Ohmann. 2019. An Assessment of Oral Exams in Introductory CS. InProceedings of the 50th ACM Technical Symposium on Computer Science Education(Minneapolis, MN, USA)(SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 613–619. doi:10.1145/3287324.3287489

work page doi:10.1145/3287324.3287489 2019

[23] [23]

Peter Ohmann and Ed Novak. 2025. A Multi-Institutional Assessment of Oral Exams in Software Courses. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1(Pittsburgh, PA, USA)(SIGCSETS 2025). Association for Computing Machinery, New York, NY, USA, 882–888. doi:10.1145/3641554.3701848

work page doi:10.1145/3641554.3701848 2025

[24] [24]

Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, Sam Lau, Stephen MacNeil, Narges Norouzi, Simone Opel, Vee Pettit, Leo Porter, Brent N. Reeves, Jaromir Savelka, IV Smith, David H., Sven Strickroth, and Daniel Zingaro. 2025. Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Too...

work page doi:10.1145/3689187.3709614 2025

[25] [25]

It’s weird that it knows what I want

James Prather, Brent N Reeves, Paul Denny, Brett A Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s weird that it knows what I want”: Usability and interactions with copilot for novice programmers.ACM transactions on computer-human interaction31, 1 (2023), 1–31

work page 2023

[26] [26]

James Prather, Brent N Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S Randrianasolo, Brett A Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The widening gap: The benefits and harms of generative AI for novice programmers. InProceedings of the 2024 ACM Conference on International Computing Education Research-Volume 1. 469–486

work page 2024

[27] [27]

Victor Rivera, Hamna Aslam, Alexandr Naumchev, Daniel de Carvalho, Mansur Khazeev, and Manuel Mazzara. 2020. Towards Code Review Guideline in a Classroom. InFrontiers in Software Engineering Education, Jean-Michel Bruel, Alfredo Capozucca, Manuel Mazzara, Bertrand Meyer, Alexandr Naumchev, and Andrey Sadovykh (Eds.). Springer International Publishing, Cha...

work page 2020

[28] [28]

2012.Flip your classroom: Reach every student in every class every day

Aaron Sams and Jonathan Bergmann. 2012.Flip your classroom: Reach every student in every class every day. International Society for Technology in Education/ISTE

work page 2012

[29] [29]

Namita Sarawagi. 2014. A flipped CS0 classroom: applying Bloom’s taxonomy to algorithmic thinking.J. Comput. Sci. Coll.29, 6 (June 2014), 21–28

work page 2014

[30] [30]

Md Istiak Hossain Shihab, Christopher Hundhausen, Ahsun Tariq, Summit Haque, Yunhan Qiao, and Brian Wise Mulanda. 2025. The Effects of GitHub Copilot on Computing Students’ Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1. 407–420

work page 2025

[31] [31]

Jinrui Tian and Ronghua Zhang. 2025. Learners’ AI dependence and critical thinking: The psychological mechanism of fatigue and the social buffering role of AI literacy.Acta Psychologica260 (2025), 105725. doi:10.1016/j.actpsy.2025.105725

work page doi:10.1016/j.actpsy.2025.105725 2025

[32] [32]

Keith Topping. 1998. Peer Assessment Between Students in Colleges and Universities.Review of Educational Research68, 3 (1998), 249–276. arXiv:https://doi.org/10.3102/00346543068003249 doi:10.3102/00346543068003249

work page doi:10.3102/00346543068003249 1998

[33] [33]

Pérez-Quiñones, and Stephen H

Scott Alexander Turner, Manuel A. Pérez-Quiñones, and Stephen H. Edwards. 2018. Peer Review in CS2: Conceptual Learning and High-Level Thinking.ACM Trans. Comput. Educ.18, 3, Article 13 (Sept. 2018), 37 pages. doi:10.1145/3152715 Manuscript submitted to ACM 22 Fowles et al

work page doi:10.1145/3152715 2018

[34] [34]

Muhammad Mahad Umair and Patrick Mukala. 2026. Generative AI-Driven or AI-Assisted Software Code Generation and the Decline of Community Knowledge Sharing: Challenges and Future Prospects. InInformation System Design: AI and ML Applications, Vikrant Bhateja, Soly Mathew Biju, and Siba K. Udgata (Eds.). Springer Nature Singapore, Singapore, 115–125

work page 2026

[35] [35]

Smith IV, Samvrit Srinath, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Daniel Zingaro, and Leo Porter

Annapurna Vadaparty, David H. Smith IV, Samvrit Srinath, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Daniel Zingaro, and Leo Porter

work page

[36] [36]

arXiv:2510.18806 [cs.CY] https://arxiv.org/abs/2510.18806

Integrating Large Language Models and Evaluating Student Outcomes in an Introductory Computer Science Course. arXiv:2510.18806 [cs.CY] https://arxiv.org/abs/2510.18806

work page arXiv

[37] [37]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New York, NY, USA,...

work page doi:10.1145/3649217.3653584 2024

[38] [38]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. InChi conference on human factors in computing systems extended abstracts. 1–7

work page 2022

[39] [39]

Yuankai Xue, Hanlin Chen, Gina R Bai, Robert Tairas, and Yu Huang. 2024. Does ChatGPT help with introductory programming? An experiment of students using ChatGPT in CS1. InProceedings of the 46th International conference on software engineering: software engineering education and training. 331–341

work page 2024

[40] [40]

Hatice Yildiz-Durak. 2019. Modeling Different Variables in Learning Basic Concepts of Programming in Flipped Classrooms.Journal of Educational Computing Research58 (03 2019), 073563311982795. doi:10.1177/0735633119827956

work page doi:10.1177/0735633119827956 2019

[41] [41]

Noor Azlinda Zainal Abidin. 2024. The Efficacy of Flipped Classroom Models in Improving Student Engagement and Achievement: A Meta-Analysis. Global Synthesis in Education Journal2 (11 2024), 25–44. doi:10.61667/v180e591

work page doi:10.61667/v180e591 2024

[42] [42]

Cynthia Zastudil, Magdalena Rogalska, Christine Kapp, Jennifer Vaughn, and Stephen MacNeil. 2023. Generative ai in computing education: Perspectives of students and instructors. In2023 IEEE Frontiers in Education Conference (FIE). IEEE, 1–9

work page 2023

[43] [43]

Yitong Zhao. 2018. Impact of Oral Exams on a Thermodynamics Course Performance. In2018 ASEE Zone IV Conference. ASEE Conferences, Boulder, Colorado. https://peer.asee.org/29617

work page 2018

[44] [44]

Huiwen Zou, Ka Ian Chan, Patrick Pang, Blandina Manditereza, and Yi-Huang Shih. 2026. To Use but Not to Depend: Pedagogical Novelty and the Cognitive Brake of Ethical Awareness in Computer Science Students’ Adoption of Generative AI.Education Sciences16, 2 (2026). doi:10.3390/educsci16020311 Received 20 February 2007; revised 12 March 2009; accepted 5 Jun...

work page doi:10.3390/educsci16020311 2026