Towards SocratiCode: Designing a Generative AI-Based Programming Tutor for K-12 Students through a 4-Week Participatory Design Study
Pith reviewed 2026-05-20 09:35 UTC · model grok-4.3
The pith
Generative AI for K-12 programming works best as a Socratic questioner embedded in human-guided lessons rather than as a direct answer provider.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across the study iterations the system shifted toward dialogic support through guided questioning, reflection prompts, misconception checks, incremental hints, and mandatory pauses for learner input; preliminary observations indicate this change improved explanation clarity, supported problem-solving engagement, and better matched novice needs when combined with human guidance.
What carries the argument
SocratiCode is the evolving adaptive tutorial system whose refinement into a Socratic tutoring model supplies guided questions and learner-input pauses instead of full solutions.
Load-bearing premise
Feedback from only two K-12 students across four weeks can show reliable gains in clarity, engagement, and fit for a wider population of novice learners.
What would settle it
A controlled trial that assigns many more K-12 students to either the final Socratic version or a directive answer-giving version and measures differences in problem-solving success and reported confusion.
Figures
read the original abstract
Generative AI creates new opportunities for programming education, but many existing systems remain overly directive, producing lengthy explanations and premature solutions that can overwhelm K-12 novices. In this paper, we present a participatory design study of how an adaptive tutorial system, SocratiCode, evolved toward a Socratic tutoring model for beginner programming instruction. Drawing on weekly learner feedback, we iteratively refined the system over a four-week study with two K-12 students learning Python. Across iterations, the system shifted from flexible tutorial generation toward a more dialogic form of support characterized by guided questioning, reflection prompts, misconception checks, incremental hints, and mandatory pauses for learner input. Our preliminary observations suggest that this Socratic shift improved explanation clarity, supported problem-solving engagement, and better aligned instruction with novice learners' needs, especially when combined with human guidance. We argue that generative AI in K-12 programming education may be most effective not as an answer engine, but as a Socratic, adaptive learning companion embedded within a human-guided instructional framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports on a 4-week participatory design study with two K-12 students in which the authors iteratively refined a generative-AI programming tutor (SocratiCode) from a flexible tutorial generator into a dialogic Socratic system that uses guided questioning, reflection prompts, misconception checks, incremental hints, and mandatory pauses. Preliminary qualitative observations from weekly learner feedback are presented as evidence that the Socratic shift improved explanation clarity, supported problem-solving engagement, and better aligned with novice needs when combined with human guidance; the authors conclude that generative AI in K-12 programming education is most effective as an embedded Socratic companion within a human-guided instructional framework.
Significance. If the reported benefits of the Socratic features can be replicated at scale, the work would supply useful design heuristics for AI tutors aimed at young beginners, particularly the value of mandatory reflection pauses and incremental scaffolding over direct answer generation. At present the contribution remains exploratory and design-oriented rather than a validated pedagogical result.
major comments (2)
- [Abstract and §4] Abstract and §4 (Results/Observations): the claims that the Socratic shift 'improved explanation clarity, supported problem-solving engagement, and better aligned instruction with novice learners' needs' rest solely on qualitative observations from two participants; no quantitative pre/post learning or engagement metrics, no control condition, and no systematic error analysis are reported, leaving attribution to the dialogic features insecure.
- [Discussion] Discussion section: the broader argument that generative AI 'may be most effective not as an answer engine, but as a Socratic, adaptive learning companion embedded within a human-guided instructional framework' extrapolates from iterative refinements driven by feedback from only two learners over four weeks; the manuscript provides no evidence that the observed changes are driven by the Socratic elements themselves rather than learner-specific factors, consistent human guidance, or study duration.
minor comments (2)
- [Methods] Methods: provide the exact system prompts or prompt-engineering changes applied at each weekly iteration so that the design trajectory can be reproduced or extended by other researchers.
- [Figures] Figures: ensure any diagrams showing the evolution of the tutor interface across the four weeks are explicitly labeled with iteration number and the specific Socratic features introduced at each step.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which has helped us better frame the exploratory scope of this participatory design study. We respond to each major comment below and describe the changes incorporated into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] the claims that the Socratic shift 'improved explanation clarity, supported problem-solving engagement, and better aligned instruction with novice learners' needs' rest solely on qualitative observations from two participants; no quantitative pre/post learning or engagement metrics, no control condition, and no systematic error analysis are reported, leaving attribution to the dialogic features insecure.
Authors: We agree that the reported observations are qualitative, drawn from only two participants, and lack quantitative pre/post measures, a control condition, or systematic error analysis. This is consistent with the participatory design methodology of the study, which prioritized iterative refinement based on weekly learner feedback rather than controlled experimentation. In the revised manuscript we have updated the abstract and Section 4 to qualify all statements as preliminary observations from the design process. We now explicitly note the absence of quantitative metrics and control conditions, avoid causal language regarding attribution to the dialogic features, and add a forward-looking statement calling for larger-scale studies with such measures to validate the observed patterns. revision: yes
-
Referee: [Discussion] the broader argument that generative AI 'may be most effective not as an answer engine, but as a Socratic, adaptive learning companion embedded within a human-guided instructional framework' extrapolates from iterative refinements driven by feedback from only two learners over four weeks; the manuscript provides no evidence that the observed changes are driven by the Socratic elements themselves rather than learner-specific factors, consistent human guidance, or study duration.
Authors: We accept that the small sample and study duration limit the strength of broader claims and that alternative explanations (learner-specific factors, human guidance, or simply the passage of time) cannot be ruled out from the available data. We have revised the Discussion section to present the argument as a set of design heuristics emerging from this case rather than a generalizable conclusion. We have added explicit discussion of potential confounds, including the role of consistent human guidance, and inserted a new Limitations subsection that directly addresses the small participant count, the four-week timeframe, and the inability to isolate the contribution of the Socratic elements from other study variables. revision: yes
Circularity Check
No significant circularity in qualitative participatory design study
full rationale
The paper describes a 4-week participatory design process with two K-12 students in which SocratiCode was iteratively refined based on direct weekly feedback. All claims about improved clarity, engagement, and alignment with novice needs are presented as preliminary observations drawn from that feedback and the resulting design changes. No equations, fitted parameters, predictions, uniqueness theorems, or self-citation chains appear; the work contains no derivations that could reduce outputs to inputs by construction. The study is therefore self-contained against external benchmarks and receives a circularity score of 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Weekly feedback from a small number of K-12 learners in participatory sessions accurately identifies effective tutoring strategies for novice programmers
Reference graph
Works this paper leans on
-
[1]
Erfan Al-Hossami, Razvan Bunescu, Ryan Teehan, Laurel Powell, Khyati Ma- hajan, and Mohsen Dorodchi. 2023. Socratic questioning of novice debuggers: A benchmark dataset and preliminary evaluations. InProceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 709–726
work page 2023
-
[2]
Ohud Abdullah Alasmari, Jeremy Singer, and Mireilla Bikanga Ada. 2023. Do current online coding tutorial systems address novice programmer difficulties?. InProceedings of the 15th International Conference on Education Technology and Computers. 242–248
work page 2023
-
[3]
Mohammed Amin Almaiah, Raghad Alfaisal, Said A Salloum, Fahima Hajjej, et al. 2022. Examining the impact of artificial intelligence and social and com- puter anxiety in e-learning settings: Students’ perceptions at the university level. Electronics11, 22 (2022), 3662
work page 2022
-
[4]
Zeyad Alshaikh, Lasagn Tamang, and Vasile Rus. 2020. A Socratic tutor for source code comprehension. InInternational conference on artificial intelligence in education. Springer, 15–19
work page 2020
-
[5]
Zeyad Alshaikh, Lasang Jimba Tamang, and Vasile Rus. 2020. Experiments with a socratic intelligent tutoring system for source code understanding. InThe Thirty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS-32)
work page 2020
-
[6]
Anonymous Anonymous. 2026. Replication Package forSocratiCodefor K-12 Students Study. doi:10.5281/zenodo.20018098
-
[7]
Samuel Boguslawski, Rowan Deer, and Mark G Dawson. 2025. Programming education and learner motivation in the age of generative AI: student and educator perspectives.Information and Learning Sciences(2025)
work page 2025
-
[8]
Michelle Brachman, Siya Kunde, Sarah Miller, Ana Fucs, Samantha Dempsey, Jamie Jabbour, and Werner Geyer. 2025. Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI Chatbot. InProceedings of the 30th International Conference on Intelligent User Interfaces. 247–264
work page 2025
-
[9]
Peter Brusilovsky and Eva Millán. 2007. User models for adaptive hypermedia and adaptive educational systems. InThe adaptive web: methods and strategies of web personalization. Springer, 3–53
work page 2007
-
[10]
2006.Constructing grounded theory: A practical guide through qualitative analysis
Kathy Charmaz. 2006.Constructing grounded theory: A practical guide through qualitative analysis. sage
work page 2006
-
[11]
Rudrajit Choudhuri, Ambareesh Ramakrishnan, Amreeta Chatterjee, Bianca Trinkenreich, et al. 2025. Insights from the Frontline: GenAI Utilization Among Software Engineering Students.IEEE Xplore(2025), 1–12
work page 2025
-
[12]
Paul Denny, David H Smith IV, Max Fowler, James Prather, Brett A Becker, and Juho Leinonen. 2024. Explaining code with a purpose: An integrated approach for developing code comprehension and prompting skills. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 283–289
work page 2024
-
[13]
Sidney D’mello and Art Graesser. 2013. AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back.ACM Transactions on Interactive Intelligent Systems (TiiS)(2013)
work page 2013
-
[14]
Ian Drosos, Jack Williams, Advait Sarkar, Nicholas Wilson, Sean Rintel, and Payod Panda. 2025. Dynamic Prompt Middleware: Contextual Prompt Refinement Controls for Comprehension Tasks. InProceedings of the 4th Annual Symposium on Human-Computer Interaction for Work. 1–23
work page 2025
-
[15]
Encyclopaedia Britannica. 2026.Socratic method. https://www.britannica.com/ topic/Socratic-method Last updated March 13, 2026. Accessed April 15, 2026
work page 2026
-
[16]
Guangrui Fan, Dandan Liu, Rui Zhang, and Lihu Pan. 2025. The impact of AI-assisted pair programming on student motivation, programming anxiety, collaborative learning, and programming performance: a comparative study with traditional pair programming and individual approaches.International Journal of STEM Education12, 1 (2025), 16
work page 2025
-
[17]
2025.Generative artificial intelligence (AI) in education
Department for Education. 2025.Generative artificial intelligence (AI) in education. Technical Report. Department for Education, UK. Updated 12 August 2025
work page 2025
-
[18]
Michail Giannakos, Roger Azevedo, et al. 2025. The promise and challenges of generative AI in education.Behaviour & Information Technology(2025)
work page 2025
-
[19]
Shuchi Grover and Roy Pea. 2013. Computational thinking in K–12: A review of the state of the field.Educational researcher42, 1 (2013), 38–43
work page 2013
-
[20]
Xingjian Gu and Barbara J Ericson. 2025. AI literacy in K-12 and higher education in the wake of generative AI: An integrative review. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1. 125–140
work page 2025
-
[21]
Joint Task Force on Computing Curricula, Association for Computing Machinery (ACM) and IEEE Computer Society. 2013.Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. ACM Press and IEEE Computer Society Press, New York, NY, USA
work page 2013
-
[22]
Caitlin Kelleher and Randy Pausch. 2005. Lowering the barriers to program- ming: A taxonomy of programming environments and languages for novice programmers.ACM computing surveys (CSUR)37, 2 (2005), 83–137
work page 2005
-
[23]
Caitlin Kelleher, Randy Pausch, and Sara Kiesler. 2007. Storytelling alice motivates middle school girls to learn computer programming. InProceedings of the SIGCHI conference on Human factors in computing systems. 1455–1464
work page 2007
-
[24]
Eric Klopfer, Justin Reich, Hal Abelson, and Cynthia Breazeal. 2024. Generative AI and K-12 education: An MIT perspective. (2024)
work page 2024
-
[25]
Uday Mittal, Siva Sai, Vinay Chamola, et al. 2024. A comprehensive review on generative AI for education.IEEE Access(2024)
work page 2024
-
[26]
Susanne Narciss and Ecenaz Alemdag. 2025. Learning from errors and failure in educational contexts: New insights and future directions for research and practice.British Journal of Educational Psychology95, 1 (2025), 197–218
work page 2025
-
[27]
Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. 2024. How Beginning Programmers and Code LLMs (Mis)read Each Other.Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)(2024), 1–26
work page 2024
-
[28]
Aannemarie Sullivan Palinscar and Ann L Brown. 1984. Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities.Cognition and instruction1, 2 (1984), 117–175
work page 1984
-
[29]
Jiyeon Park and Sam Choo. 2025. Generative AI prompt engineering for educators: Practical strategies.Journal of Special Education Technology40, 3 (2025), 411–417
work page 2025
-
[30]
Christian Rahe and Walid Maalej. 2025. How Do Programming Students Use Generative AI?Proceedings of the ACM on Software EngineeringFSE (2025)
work page 2025
-
[31]
Brian J Reiser. 2018. Scaffolding complex learning: The mechanisms of structuring and problematizing student work. InScaffolding. Psychology Press, 273–304
work page 2018
-
[32]
Sangho Suh, Jian Zhao, and Edith Law. 2022. Codetoon: Story ideation, auto comic generation, and structure mapping for code-driven storytelling. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology
work page 2022
-
[33]
Osman Tasdelen and Daniel Bodemer. 2025. Generative AI in the classroom: Effects of context-personalized learning material and tasks on motivation and performance.International Journal of Artificial Intelligence in Education(2025). TowardsSocratiCode: Designing a Generative AI-Based Programming Tutor for K-12 Students through a 4-Week Participatory Design...
work page 2025
-
[34]
Selin Urhan and Selay Arkun Kocadere. 2024. Problem-Solving Through Pair- Programming: The Mediational Role of ChatGPT. In2024 5th International Con- ference in Electronic Engineering, Information Technology & Education. IEEE
work page 2024
-
[35]
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, et al. 2023. A prompt pattern catalog to enhance prompt engi- neering with chatgpt.arXiv preprint arXiv:2302.11382(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
Leon E Winslow. 1996. Programming pedagogy—a psychological overview.ACM Sigcse Bulletin28, 3 (1996), 17–22
work page 1996
-
[37]
Yangtian Zi, Luisa Li, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman
-
[38]
I Would Have Written My Code Differently
“I Would Have Written My Code Differently”: Beginners Struggle to Understand LLM-Generated Code.Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion ’25)(2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.