AI-Generated Traces for Novice Programmers: Learning Effects and Learner Differences in a Multi-Institutional Study

Anastasiia Birillo; Gosia Migut; Michael Liut; Naaz Sibia; Thomas Overklift Vaupel Klein; Yuri Noviello

arxiv: 2606.03288 · v1 · pith:M724NKMVnew · submitted 2026-06-02 · 💻 cs.CY · cs.AI

AI-Generated Traces for Novice Programmers: Learning Effects and Learner Differences in a Multi-Institutional Study

Yuri Noviello , Naaz Sibia , Anastasiia Birillo , Thomas Overklift Vaupel Klein , Michael Liut , Gosia Migut This is my paper

Pith reviewed 2026-06-28 08:19 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords AI-generated visualizationsnovice programmersCS1animated tracesprogram executionlearner engagementeducational technologymulti-institutional study

0 comments

The pith

AI-generated animated traces improve immediate learning of program execution for some novices but effects are short-term and depend on engagement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Generated Animated Traces as AI-created narrated animations that link code, execution states, and analogies to help CS1 students grasp how programs run. A multi-institutional experiment with over a thousand students in Python and Java courses compared these animations against plain text explanations, tracking both immediate test scores and longer-term course outcomes. Results indicate selective gains right after exposure for certain learners, yet no lasting advantage appears on final exams or overall engagement. The size of any benefit varies with individual student engagement patterns, pointing toward the need for tools that adapt to different learner profiles rather than one-size-fits-all delivery.

Core claim

Generated Animated Traces (GATs) are AI-generated, analogy-based, narrated animations that coordinate source code, execution state, and conceptual analogies. In the two-institution study, GATs produced selective benefits for immediate learning performance compared with textual explanations, yet these benefits remained context-dependent and short-term; GATs' influence on performance was moderated by learner engagement profiles, underscoring the value of personalized approaches.

What carries the argument

Generated Animated Traces (GATs): AI-generated narrated animations that coordinate source code, runtime state, and conceptual analogies to make program execution explicit.

If this is right

GATs produce immediate performance gains on execution-related tasks for some students but not others.
Any immediate gains do not carry over to end-of-course exam scores or sustained engagement measures.
Learner engagement profiles moderate whether GATs affect performance at all.
Educational tools for programming benefit from adaptation to individual engagement patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of future AI learning tools could build real-time detection of engagement to decide when to switch between animation and text.
The same coordination of code, state, and analogy might be tested in other process-heavy domains such as chemistry reaction mechanisms.
Short-term benefits suggest GATs work best when embedded repeatedly inside practice sessions rather than offered as standalone resources.

Load-bearing premise

Differences in immediate learning and course outcomes can be attributed to GATs versus text explanations without major interference from institutional differences, varying student populations, or unmeasured variables.

What would settle it

A follow-up experiment at the same institutions that finds identical immediate post-exposure scores and identical end-of-course exam results between GAT and text groups after controlling for engagement profiles.

Figures

Figures reproduced from arXiv: 2606.03288 by Anastasiia Birillo, Gosia Migut, Michael Liut, Naaz Sibia, Thomas Overklift Vaupel Klein, Yuri Noviello.

**Figure 3.** Figure 3: Example of a textual explanation provided to the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Engagement profiles and moderated treatment ef [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Introductory programming (CS1) courses often struggle to support students' understanding of program execution. While visualizations can make execution processes explicit, their effectiveness depends on design and context, and empirical evidence for AI-generated visualizations remains limited. We propose Generated Animated Traces (GATs), AI-generated, analogy-based, narrated animations that coordinate source code, execution state, and conceptual analogies. We conduct a study at two institutions in CS1 courses (Python, N=961; Java N=151) comparing GATs to textual explanations. We measure immediate learning performance and experience, end-of-course engagement and exam performance. Results show that GATs can yield selective benefits for immediate learning, but benefits are context-dependent and short-term. We observe that GATs' influence on performance is moderated by learner engagement profiles. This finding underscores the importance of personalized approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-institutional trial of AI-generated animated traces in CS1 reports selective short-term gains moderated by engagement, but site and language differences need explicit handling to support the claims.

read the letter

The main thing to know is that this paper ran a large comparison of AI-generated animated traces against plain text explanations across two CS1 courses, one Python with 961 students and one Java with 151. They report selective immediate learning benefits that depend on student engagement profiles, but those effects do not carry through to end-of-course exams or engagement.

The scale is the clearest strength. Most visualization studies in this area stay small and single-site; here they collected data from two institutions and tracked both immediate performance and longer-term outcomes. That setup gives more weight to the claim that benefits are context-dependent and short-term, and it supports their point about needing personalization.

The soft spot is the multi-institutional design. Different languages, different student populations, and different course structures sit right next to the intervention. The abstract gives no sign that institution or language was entered as a fixed effect, random effect, or interaction term. If the full analysis does not test or control for those factors, the moderation results and the attribution to GATs become harder to separate from baseline site variation. The methods details on exclusion criteria, exact measures, and effect sizes are also missing from the abstract, so the strength of the evidence cannot be judged yet.

This paper is for CS education researchers who work on tools for novices or on learner differences. A reader who already follows visualization or AI-assisted programming work would get incremental data points, but only after checking the statistical controls and robustness checks in the full text.

I would send it to peer review. The sample size and the attempt to link immediate effects to engagement profiles make it worth a referee's time, even if the analysis needs tightening on the institutional factors.

Referee Report

2 major / 2 minor

Summary. The manuscript reports results from a multi-institutional study in CS1 courses comparing Generated Animated Traces (GATs)—AI-generated, analogy-based narrated animations coordinating code, execution state, and analogies—to textual explanations. Samples are N=961 (Python) and N=151 (Java). The central claims are that GATs produce selective benefits for immediate learning that are context-dependent and short-term, and that GAT effects on performance are moderated by learner engagement profiles.

Significance. If the attribution to GATs survives proper controls for institutional and language differences, the work would add to CS education research by showing how AI-generated visualizations can support program execution understanding and by underscoring the value of engagement-profile moderation for personalization. The large Python sample is a positive feature.

major comments (2)

[Abstract and study design description] Abstract and study design description: the abstract states results on selective benefits and moderation by engagement profiles but supplies no statistical methods, effect sizes, controls, exclusion criteria, or measurement details, rendering it impossible to evaluate whether the data support the stated claims.
[Multi-institutional design] Multi-institutional design: the study is conducted at two institutions using different languages and presumably different student populations and course structures, yet the design description provides no indication that institution or language is modeled as a fixed effect, random effect, or interaction term in the performance analyses. This is load-bearing for the central claim that attributes differences in immediate learning and moderation to GATs rather than baseline institutional variation.

minor comments (2)

[Methods] Clarify the exact operationalization of 'immediate learning performance,' 'end-of-course engagement,' and 'exam performance' and whether any pre-tests or covariates were used.
[Abstract] The abstract could more explicitly note the short-term nature of the observed benefits to avoid overgeneralization in the opening summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments. We address each major comment below and will make revisions to improve the manuscript's clarity and rigor.

read point-by-point responses

Referee: Abstract and study design description: the abstract states results on selective benefits and moderation by engagement profiles but supplies no statistical methods, effect sizes, controls, exclusion criteria, or measurement details, rendering it impossible to evaluate whether the data support the stated claims.

Authors: We agree with the referee that the abstract would benefit from additional details on the statistical methods to support the claims. In the revised version, we will include a concise description of the key statistical approaches, effect sizes where relevant, and note on controls and exclusion criteria within the abstract's constraints. revision: yes
Referee: Multi-institutional design: the study is conducted at two institutions using different languages and presumably different student populations and course structures, yet the design description provides no indication that institution or language is modeled as a fixed effect, random effect, or interaction term in the performance analyses. This is load-bearing for the central claim that attributes differences in immediate learning and moderation to GATs rather than baseline institutional variation.

Authors: The analyses were performed separately for the Python and Java cohorts to account for differences in language and institutional contexts. To strengthen the manuscript, we will revise the methods and results sections to explicitly describe how institutional and language differences were handled, including any use of fixed effects or covariates for institution, and discuss potential limitations in attributing effects solely to GATs. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical study with no derivations or self-referential predictions

full rationale

This paper reports results from a multi-institutional empirical experiment comparing Generated Animated Traces (GATs) to textual explanations in CS1 courses. It measures immediate learning, engagement, and exam performance using standard statistical comparisons across Python (N=961) and Java (N=151) cohorts. There are no equations, derivations, fitted parameters presented as predictions, uniqueness theorems, or ansatzes. All claims rest on observed data outcomes rather than any reduction to inputs by construction. The multi-institutional design and moderation analyses by engagement profiles are standard empirical methods and do not involve self-citation load-bearing or renaming of known results as new derivations. This matches the default expectation of no significant circularity for non-theoretical empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no mathematical content, free parameters, or new postulated entities are described. The work relies on standard assumptions of educational research experiments that are not detailed here.

pith-pipeline@v0.9.1-grok · 5704 in / 1002 out tokens · 38389 ms · 2026-06-28T08:19:02.576681+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 1 canonical work pages

[1]

Shaaron Ainsworth. 2006. DeFT: A Conceptual Framework for Considering Learning With Multiple Representations.Learning and Instruction16, 3 (2006), 183–198

2006
[2]

Roman Bednarik. 2012. Expertise-dependent visual attention strategies develop over time during debugging with multiple code representations.International Journal of Human-Computer Studies70, 2 (2012), 143–155

2012
[3]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological)57, 1 (1995), 289–300

1995
[4]

Like a Nesting Doll

Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Little- field, Sami Sarsa, and Stephen Macneil. 2024. "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students Using Large Language Models. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 122–128

2024
[5]

Briana Bettin, Linda Ott, and Julia Hiebel. 2022. Semaphore or metaphor? Ex- ploring concurrent students’ conceptions of and with analogy. InProceedings of the 27th ACM Conference on Innovation and Technology in Computer Science Education Vol. 1. 200–206

2022
[6]

Jacob Bishop and Matthew A Verleger. 2013. The flipped classroom: A survey of the research. In2013 ASEE annual conference & exposition. 23–1200

2013
[7]

Yingjun Cao, Leo Porter, and Daniel Zingaro. 2016. Examining the value of analogies in introductory computing. InProceedings of the 2016 ACM Conference on International computing education research. 231–239

2016
[8]

Michelene TH Chi and Muhsin Menekse. 2015. Dialogue patterns in peer collab- oration that promote learning.Socializing intelligence through academic talk and dialogue1, 2 (2015), 263–274

2015
[9]

Michelene TH Chi and Ruth Wylie. 2014. The ICAP framework: Linking cognitive engagement to active learning outcomes.Educational psychologist49, 4 (2014), 219–243

2014
[10]

Kathryn Cunningham, Sarah Blanchard, Barbara Ericson, and Mark Guzdial. 2017. Using Tracing and Sketching to Solve Programming Problems: Replicating and Extending an Analysis of What Students Draw. InProceedings of the 2017 ACM Conference on International Computing Education Research. 164–172

2017
[11]

Björn B de Koning and Halszka Jarodzka. 2017. Attention guidance strategies for supporting learning from dynamic visualizations. InLearning from dynamic visualization: Innovations in research and application. Springer, 255–278

2017
[12]

Dimitri Eckert, Dion Timmermann, and Christian Kautz. 2022. Student Miscon- ceptions About Loops in Introductory Programming Courses and the Influence of Representations. In2022 IEEE Frontiers in Education Conference (FIE). IEEE, 1–5

2022
[13]

Sally Fincher, Johan Jeuring, Craig S Miller, Peter Donaldson, Benedict Du Boulay, Matthias Hauswirth, Arto Hellas, Felienne Hermans, Colleen Lewis, Andreas Mühling, et al. 2020. Notional Machines in Computing Education: The Education of Attention. InProceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 21–50

2020
[14]

Michal Forišek and Monika Steinová. 2012. Metaphors and analogies for teaching algorithms. InProceedings of the 43rd ACM technical symposium on Computer Science Education. 15–20

2012
[15]

Rita Garcia and Michelle Craig. 2025. 20 Years Later: A Replication Study on Teaching CS1 Concepts.ACM Trans. Comput. Educ.25, 2, Article 22 (June 2025), 33 pages. doi:10.1145/3730405

work page doi:10.1145/3730405 2025
[16]

Philip J Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visu- alization for Cs Education. InProceeding of the 44th ACM Technical Symposium on Computer Science Education. 579–584

2013
[17]

Philip J Guo. 2018. Non-native english speakers learning computer program- ming: Barriers, desires, and design opportunities. InProceedings of the 2018 CHI conference on human factors in computing systems. 1–14

2018
[18]

Tran Trieu Hai, Duong Thi Thuy Mai, and Nguyen Van Hanh. 2025. A rapid review of using AI-generated instructional videos in higher education.Frontiers in Computer Science7 (2025), 1721093

2025
[19]

Colton Harper, Jake Rance, Paul Owens, and Stephen Cooper. 2024. Tool-Driven Scaffolding of Student-Generated Analogies in CS1. InProceedings of the 8th Conference on Computing Education Practice. 5–8

2024
[20]

Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psy- chology. Vol. 52. Elsevier, 139–183

1988
[21]

2011.The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies

Bradley Huitema. 2011.The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies. John Wiley & Sons

2011
[22]

1983.Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness

Philip Nicholas Johnson-Laird. 1983.Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Number 6. Harvard University Press

1983
[23]

Erkki Kaila, Matti Luukkainen, Antti Laaksonen, and Kjell Lemström. 2023. On Changing the Curriculum Programming Language from Java to Python (Discus- sion Paper). InProceedings of the 23rd Koli Calling International Conference on Computing Education Research. 1–7

2023
[24]

Slava Kalyuga. 2007. Expertise reversal effect and its implications for learner- tailored instruction.Educational Psychology Review19, 4 (2007), 509–539

2007
[25]

Slava Kalyuga. 2021. The expertise reversal principle in multimedia learning. Cambridge University Press

2021
[26]

Macredie

Theodora Koulouri, Stanislao Lauria, and Robert D. Macredie. 2015. Teaching Introductory Programming: A Quantitative Evaluation of Different Approaches. ACM Trans. Comput. Educ.14, 4, Article 26 (Dec. 2015), 28 pages. doi:10.1145/ 2662412

2015
[27]

Erno Lokkila, Athanasios Christopoulos, and Mikko-Jussi Laakso. 2023. A data- driven approach to compare the syntactic difficulty of programming languages. Journal of Information Systems Education34, 1 (2023), 84–93

2023
[28]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

1947
[29]

Lauren E Margulieux, Briana B Morrison, and Adrienne Decker. 2020. Reducing withdrawal and failure rates in introductory programming with subgoal labeled worked examples.International Journal of STEM Education7, 1 (2020), 19

2020
[30]

Richard E Mayer, Emily Griffith, Ilana TN Jurkowitz, and Daniel Rothman. 2008. Increased interestingness of extraneous details in a multimedia science presenta- tion leads to decreased learning.Journal of Experimental Psychology: Applied14, 4 (2008), 329

2008
[31]

Torbjørn Netland, Oliver von Dzengelevski, Katalin Tesch, and Daniel Kwas- nitschka. 2025. Comparing human-made and AI-generated teaching videos: An experimental study on learning effects.Computers & Education224 (2025), 105164

2025
[32]

Yuri Noviello, Anastasia Birillo, and Gosia Migut. 2026. ANVIL: Analogies and Video for Lecturers. InArtificial Intelligence in Education (Lecture Notes in Com- puter Science). Springer. Accepted for publication in the proceedings of AIED 2026

2026
[33]

Fred GWC Paas. 1992. Training Strategies for Attaining Transfer of Problem- Solving Skill in Statistics: A Cognitive-Load Approach.Journal of Educational Psychology84, 4 (1992), 429

1992
[34]

Allan Paivio. 1991. Dual coding theory: Retrospect and current status.Canadian Journal of Psychology/Revue canadienne de psychologie45, 3 (1991), 255

1991
[35]

2000.Mixed-effects models in S and S-PLUS

José C Pinheiro and Douglas M Bates. 2000.Mixed-effects models in S and S-PLUS. Springer

2000
[36]

Paul R Pintrich et al. 1991. A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). (1991)

1991
[37]

Richard M Ryan and Edward L Deci. 2024. Self-determination theory. InEncyclo- pedia of quality of life and well-being research. Springer, 6229–6235

2024
[38]

Pawan Saxena, Sanjay Kumar Singh, and Gopal Gupta. 2023. Achieving effective learning outcomes through the use of analogies in teaching computer science. Mathematics11, 15 (2023), 3340

2023
[39]

Naaz Sibia, Valeria Ramirez Osorio, Jessica Wen, Rutwa Engineer, Angela Zavaleta Bernuy, Andrew Petersen, Michael Liut, and Carolina Nobre. 2025. From Code to Concept: Evaluating Multiple Coordinated Views in Introductory Programming. arXiv:2509.26466 [cs.HC] https://arxiv.org/abs/2509.26466

arXiv 2025
[40]

Juha Sorva, Ville Karavirta, and Lauri Malmi. 2013. A review of generic pro- gram visualization systems for introductory programming education.ACM Transactions on Computing Education (TOCE)13, 4 (2013), 1–64

2013
[41]

John Sweller. 2011. Cognitive load theory. InPsychology of Learning and Motiva- tion. Vol. 55. Elsevier, 37–76

2011
[42]

Lynda Thomas, Mark Ratcliffe, and Benjy Thomasson. 2004. Scaffolding With Object Diagrams in First Year Programming Classes: Some Unexpected Results. ACM SIGCSE Bulletin36, 1 (2004), 250–254

2004
[43]

Rachel M Wong, Olusola Adesope, Chi Yang Chuang, Oluwasola S Oni, Bernie Vanwie, Prashanta Dutta, Kitana Kaiphanliam, Felicia Adesope, Oluwafemi J Ajeigbe, and Jacqueline Gartner. 2024. Engineering students engagement profiles while using low-cost desktop learning module.IJEE International Journal of Engineering Education(2024)

2024
[44]

Tao Xu, Yuan Liu, Yaru Jin, Yueyao Qu, Jie Bai, Wenlan Zhang, and Yun Zhou
[45]

From recorded to AI-generated instructional videos: A comparison of learning performance and experience.British Journal of Educational Technology 56, 4 (2025), 1463–1487

2025
[46]

Tingting Zhu, Rutwa Engineer, Xaria Prempeh, Anna Ly, Michelle Craig, and Andrew Petersen. 2025. Comparing physical analogue and traditional videos for learning and emotional engagement.Discover Education4, 1 (2025), 71

2025

[1] [1]

Shaaron Ainsworth. 2006. DeFT: A Conceptual Framework for Considering Learning With Multiple Representations.Learning and Instruction16, 3 (2006), 183–198

2006

[2] [2]

Roman Bednarik. 2012. Expertise-dependent visual attention strategies develop over time during debugging with multiple code representations.International Journal of Human-Computer Studies70, 2 (2012), 143–155

2012

[3] [3]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological)57, 1 (1995), 289–300

1995

[4] [4]

Like a Nesting Doll

Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas, Matt Little- field, Sami Sarsa, and Stephen Macneil. 2024. "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS Students Using Large Language Models. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 122–128

2024

[5] [5]

Briana Bettin, Linda Ott, and Julia Hiebel. 2022. Semaphore or metaphor? Ex- ploring concurrent students’ conceptions of and with analogy. InProceedings of the 27th ACM Conference on Innovation and Technology in Computer Science Education Vol. 1. 200–206

2022

[6] [6]

Jacob Bishop and Matthew A Verleger. 2013. The flipped classroom: A survey of the research. In2013 ASEE annual conference & exposition. 23–1200

2013

[7] [7]

Yingjun Cao, Leo Porter, and Daniel Zingaro. 2016. Examining the value of analogies in introductory computing. InProceedings of the 2016 ACM Conference on International computing education research. 231–239

2016

[8] [8]

Michelene TH Chi and Muhsin Menekse. 2015. Dialogue patterns in peer collab- oration that promote learning.Socializing intelligence through academic talk and dialogue1, 2 (2015), 263–274

2015

[9] [9]

Michelene TH Chi and Ruth Wylie. 2014. The ICAP framework: Linking cognitive engagement to active learning outcomes.Educational psychologist49, 4 (2014), 219–243

2014

[10] [10]

Kathryn Cunningham, Sarah Blanchard, Barbara Ericson, and Mark Guzdial. 2017. Using Tracing and Sketching to Solve Programming Problems: Replicating and Extending an Analysis of What Students Draw. InProceedings of the 2017 ACM Conference on International Computing Education Research. 164–172

2017

[11] [11]

Björn B de Koning and Halszka Jarodzka. 2017. Attention guidance strategies for supporting learning from dynamic visualizations. InLearning from dynamic visualization: Innovations in research and application. Springer, 255–278

2017

[12] [12]

Dimitri Eckert, Dion Timmermann, and Christian Kautz. 2022. Student Miscon- ceptions About Loops in Introductory Programming Courses and the Influence of Representations. In2022 IEEE Frontiers in Education Conference (FIE). IEEE, 1–5

2022

[13] [13]

Sally Fincher, Johan Jeuring, Craig S Miller, Peter Donaldson, Benedict Du Boulay, Matthias Hauswirth, Arto Hellas, Felienne Hermans, Colleen Lewis, Andreas Mühling, et al. 2020. Notional Machines in Computing Education: The Education of Attention. InProceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 21–50

2020

[14] [14]

Michal Forišek and Monika Steinová. 2012. Metaphors and analogies for teaching algorithms. InProceedings of the 43rd ACM technical symposium on Computer Science Education. 15–20

2012

[15] [15]

Rita Garcia and Michelle Craig. 2025. 20 Years Later: A Replication Study on Teaching CS1 Concepts.ACM Trans. Comput. Educ.25, 2, Article 22 (June 2025), 33 pages. doi:10.1145/3730405

work page doi:10.1145/3730405 2025

[16] [16]

Philip J Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visu- alization for Cs Education. InProceeding of the 44th ACM Technical Symposium on Computer Science Education. 579–584

2013

[17] [17]

Philip J Guo. 2018. Non-native english speakers learning computer program- ming: Barriers, desires, and design opportunities. InProceedings of the 2018 CHI conference on human factors in computing systems. 1–14

2018

[18] [18]

Tran Trieu Hai, Duong Thi Thuy Mai, and Nguyen Van Hanh. 2025. A rapid review of using AI-generated instructional videos in higher education.Frontiers in Computer Science7 (2025), 1721093

2025

[19] [19]

Colton Harper, Jake Rance, Paul Owens, and Stephen Cooper. 2024. Tool-Driven Scaffolding of Student-Generated Analogies in CS1. InProceedings of the 8th Conference on Computing Education Practice. 5–8

2024

[20] [20]

Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psy- chology. Vol. 52. Elsevier, 139–183

1988

[21] [21]

2011.The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies

Bradley Huitema. 2011.The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies. John Wiley & Sons

2011

[22] [22]

1983.Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness

Philip Nicholas Johnson-Laird. 1983.Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Number 6. Harvard University Press

1983

[23] [23]

Erkki Kaila, Matti Luukkainen, Antti Laaksonen, and Kjell Lemström. 2023. On Changing the Curriculum Programming Language from Java to Python (Discus- sion Paper). InProceedings of the 23rd Koli Calling International Conference on Computing Education Research. 1–7

2023

[24] [24]

Slava Kalyuga. 2007. Expertise reversal effect and its implications for learner- tailored instruction.Educational Psychology Review19, 4 (2007), 509–539

2007

[25] [25]

Slava Kalyuga. 2021. The expertise reversal principle in multimedia learning. Cambridge University Press

2021

[26] [26]

Macredie

Theodora Koulouri, Stanislao Lauria, and Robert D. Macredie. 2015. Teaching Introductory Programming: A Quantitative Evaluation of Different Approaches. ACM Trans. Comput. Educ.14, 4, Article 26 (Dec. 2015), 28 pages. doi:10.1145/ 2662412

2015

[27] [27]

Erno Lokkila, Athanasios Christopoulos, and Mikko-Jussi Laakso. 2023. A data- driven approach to compare the syntactic difficulty of programming languages. Journal of Information Systems Education34, 1 (2023), 84–93

2023

[28] [28]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

1947

[29] [29]

Lauren E Margulieux, Briana B Morrison, and Adrienne Decker. 2020. Reducing withdrawal and failure rates in introductory programming with subgoal labeled worked examples.International Journal of STEM Education7, 1 (2020), 19

2020

[30] [30]

Richard E Mayer, Emily Griffith, Ilana TN Jurkowitz, and Daniel Rothman. 2008. Increased interestingness of extraneous details in a multimedia science presenta- tion leads to decreased learning.Journal of Experimental Psychology: Applied14, 4 (2008), 329

2008

[31] [31]

Torbjørn Netland, Oliver von Dzengelevski, Katalin Tesch, and Daniel Kwas- nitschka. 2025. Comparing human-made and AI-generated teaching videos: An experimental study on learning effects.Computers & Education224 (2025), 105164

2025

[32] [32]

Yuri Noviello, Anastasia Birillo, and Gosia Migut. 2026. ANVIL: Analogies and Video for Lecturers. InArtificial Intelligence in Education (Lecture Notes in Com- puter Science). Springer. Accepted for publication in the proceedings of AIED 2026

2026

[33] [33]

Fred GWC Paas. 1992. Training Strategies for Attaining Transfer of Problem- Solving Skill in Statistics: A Cognitive-Load Approach.Journal of Educational Psychology84, 4 (1992), 429

1992

[34] [34]

Allan Paivio. 1991. Dual coding theory: Retrospect and current status.Canadian Journal of Psychology/Revue canadienne de psychologie45, 3 (1991), 255

1991

[35] [35]

2000.Mixed-effects models in S and S-PLUS

José C Pinheiro and Douglas M Bates. 2000.Mixed-effects models in S and S-PLUS. Springer

2000

[36] [36]

Paul R Pintrich et al. 1991. A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). (1991)

1991

[37] [37]

Richard M Ryan and Edward L Deci. 2024. Self-determination theory. InEncyclo- pedia of quality of life and well-being research. Springer, 6229–6235

2024

[38] [38]

Pawan Saxena, Sanjay Kumar Singh, and Gopal Gupta. 2023. Achieving effective learning outcomes through the use of analogies in teaching computer science. Mathematics11, 15 (2023), 3340

2023

[39] [39]

Naaz Sibia, Valeria Ramirez Osorio, Jessica Wen, Rutwa Engineer, Angela Zavaleta Bernuy, Andrew Petersen, Michael Liut, and Carolina Nobre. 2025. From Code to Concept: Evaluating Multiple Coordinated Views in Introductory Programming. arXiv:2509.26466 [cs.HC] https://arxiv.org/abs/2509.26466

arXiv 2025

[40] [40]

Juha Sorva, Ville Karavirta, and Lauri Malmi. 2013. A review of generic pro- gram visualization systems for introductory programming education.ACM Transactions on Computing Education (TOCE)13, 4 (2013), 1–64

2013

[41] [41]

John Sweller. 2011. Cognitive load theory. InPsychology of Learning and Motiva- tion. Vol. 55. Elsevier, 37–76

2011

[42] [42]

Lynda Thomas, Mark Ratcliffe, and Benjy Thomasson. 2004. Scaffolding With Object Diagrams in First Year Programming Classes: Some Unexpected Results. ACM SIGCSE Bulletin36, 1 (2004), 250–254

2004

[43] [43]

Rachel M Wong, Olusola Adesope, Chi Yang Chuang, Oluwasola S Oni, Bernie Vanwie, Prashanta Dutta, Kitana Kaiphanliam, Felicia Adesope, Oluwafemi J Ajeigbe, and Jacqueline Gartner. 2024. Engineering students engagement profiles while using low-cost desktop learning module.IJEE International Journal of Engineering Education(2024)

2024

[44] [44]

Tao Xu, Yuan Liu, Yaru Jin, Yueyao Qu, Jie Bai, Wenlan Zhang, and Yun Zhou

[45] [45]

From recorded to AI-generated instructional videos: A comparison of learning performance and experience.British Journal of Educational Technology 56, 4 (2025), 1463–1487

2025

[46] [46]

Tingting Zhu, Rutwa Engineer, Xaria Prempeh, Anna Ly, Michelle Craig, and Andrew Petersen. 2025. Comparing physical analogue and traditional videos for learning and emotional engagement.Discover Education4, 1 (2025), 71

2025