Design and Deployment of a Course-Aware AI Tutor in an Introductory Programming Course
Pith reviewed 2026-05-10 16:17 UTC · model grok-4.3
The pith
A course-specific Python tutor supplies hints and Socratic questions drawn from class materials instead of complete solutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We designed and deployed a course-specific online Python tutor that provides retrieval-augmented, course-aligned guidance without generating complete solutions. The tutor integrates a web-based programming environment with a conversational agent that offers hints, Socratic questions, and explanations grounded in course materials. Students used the system during self-study to work on homework assignments, and the tutor also supported questions about the broader course material. We observed that students used the tutor primarily for conceptual understanding, implementation guidance, and debugging, and perceived it as a course-aligned, context-aware learning support that encourages engagement.
What carries the argument
Retrieval-augmented conversational agent that pulls only from the course's own materials to generate hints, Socratic questions, and partial explanations while refusing to emit full program solutions.
If this is right
- Students receive targeted help on homework while still having to carry out the core coding steps themselves.
- The same system can answer questions about lecture concepts and course policies in addition to assignment-specific issues.
- Usage logs show the dominant requests are for explanations of ideas, step-by-step implementation pointers, and help locating bugs.
- Students report the tutor feels tied to their class rather than a generic tool that simply hands out answers.
Where Pith is reading between the lines
- The same retrieval-plus-restriction pattern could be tried in other first-year technical courses where over-reliance on general AI tools is a concern.
- Future deployments could add logging of whether students eventually solve the problem after receiving hints, to track changes in persistence.
- If the tutor is made available across multiple semesters, usage patterns might reveal which course topics consistently need more scaffolding.
Load-bearing premise
That the collected student feedback forms and interaction logs are enough to show the tutor actually keeps novices from neglecting their own problem-solving practice.
What would settle it
A controlled comparison in which students who used the tutor are later asked to complete similar programming tasks without any AI help, and their independent solution rates are measured against a matched group that never had the tutor.
read the original abstract
Large Language Models (LLMs) have become part of how students solve programming tasks, offering immediate explanations and even full solutions. Previous work has highlighted that novice programmers often heavily rely on LLMs, thereby neglecting their own problem-solving skills. To address this challenge, we designed a course-specific online Python tutor that provides retrieval-augmented, course-aligned guidance without generating complete solutions. The tutor integrates a web-based programming environment with a conversational agent that offers hints, Socratic questions, and explanations grounded in course materials. Students used the system during self-study to work on homework assignments, and the tutor also supported questions about the broader course material. We collected structured student feedback and analyzed interaction logs to investigate how they engaged with the tutor's guidance. We observed that students used the tutor primarily for conceptual understanding, implementation guidance, and debugging, and perceived it as a course-aligned, context-aware learning support that encourages engagement rather than direct solution copying.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the design and deployment of a course-aware AI tutor for an introductory Python programming course. The system integrates a web-based IDE with a conversational agent that uses retrieval-augmented generation over course materials to deliver hints, Socratic questions, and explanations without generating complete solutions. Based on structured student feedback and interaction logs collected during self-study on homework, the authors observe that students primarily sought conceptual understanding, implementation guidance, and debugging support, and perceived the tutor as course-aligned support that encourages engagement rather than direct solution copying.
Significance. If the observations hold, the work offers a concrete example of a controlled LLM integration in programming education that aims to provide immediate support while discouraging over-reliance. It contributes practical design insights for course-specific tutors and real-world deployment data from an introductory course, which could inform future systems seeking to balance AI assistance with skill development.
major comments (1)
- [§5] §5 (Results/Evaluation): The central claim that the tutor design addresses novices neglecting their own problem-solving skills rests on the observation that students used it for conceptual hints and perceived it as encouraging engagement. However, the interaction logs only categorize query types without proxies for independent effort (e.g., time-to-first-query or error sequences before tutor use), and no baseline comparison to unrestricted LLM access or pre/post measures of problem-solving performance on held-out tasks are reported. Structured feedback is self-reported and lacks controls for social-desirability bias, leaving the claim that the system mitigates skill neglect unsupported by the presented evidence.
minor comments (2)
- [Abstract] Abstract and §4 (Data Collection): Sample size, exact number of interactions, distribution of query categories, and details of the qualitative/quantitative analysis methods are not reported, reducing the ability to assess the robustness of the observations.
- [§3] §3 (System Design): The precise prompting strategy and retrieval mechanism used to enforce 'no complete solutions' should be described in more detail, including any failure cases where full code was still generated.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. We address the single major comment below, indicating where revisions will be made to clarify claims and limitations.
read point-by-point responses
-
Referee: [§5] §5 (Results/Evaluation): The central claim that the tutor design addresses novices neglecting their own problem-solving skills rests on the observation that students used it for conceptual hints and perceived it as encouraging engagement. However, the interaction logs only categorize query types without proxies for independent effort (e.g., time-to-first-query or error sequences before tutor use), and no baseline comparison to unrestricted LLM access or pre/post measures of problem-solving performance on held-out tasks are reported. Structured feedback is self-reported and lacks controls for social-desirability bias, leaving the claim that the system mitigates skill neglect unsupported by the presented evidence.
Authors: We appreciate the referee highlighting the distinction between design intent and empirical demonstration of skill preservation. The manuscript frames the work as an observational deployment study of a course-specific tutor that avoids generating complete solutions. The reported observations are that interaction logs show predominant use for conceptual understanding, implementation guidance, and debugging, while structured feedback indicates students perceived the tutor as aligned with course goals and encouraging engagement. We do not claim to have measured mitigation of skill neglect via direct proxies (time-to-first-query, error sequences) or controlled comparisons, as these data were not collected. Self-reported perceptions are indeed subject to bias. We will revise §5 and the discussion to explicitly distinguish the design goals and observed usage patterns from any assertion of proven skill development outcomes. The limitations section will be expanded to note the absence of baseline or pre/post measures and to recommend controlled studies for future work. revision: partial
- We cannot add proxies for independent effort, baseline comparisons to unrestricted LLMs, or pre/post performance measures on held-out tasks, as the study was a single-group observational deployment without these elements in the data collection protocol.
Circularity Check
No circularity: purely descriptive design and observation study
full rationale
The paper describes the design and deployment of a course-specific AI tutor, reports interaction logs and structured student feedback, and presents observational findings on usage patterns. No equations, fitted parameters, predictions, uniqueness theorems, or derivation chains appear in the work. Claims about student engagement and perceptions are drawn directly from the collected data without any reduction to self-defined inputs or self-citation load-bearing steps. This is a standard non-circular descriptive systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
P. L. S. Barbosa, R. A. F. d. Carmo, J. P. Gomes, and W. Viana. Adaptive learning in computer science education: A scoping review.Education and Information Technologies, 29(8):9139–9188, 2024
work page 2024
-
[3]
P. Bassner, E. Frankford, and S. Krusche. Iris: An ai-driven virtual tutor for computer science education. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1, page 394–400. ACM, 2024
work page 2024
-
[4]
B. A. Becker, P. Denny, J. Finnie-Ansley, A. Luxton-Reilly, J. Prather, and E. A. San- tos. Programming is hard - or at least it used to be: Educational opportunities and challenges of ai code generation. InProc. of the 54th ACM Techn. Symp. on Computer Science Education V. 1, page 500–506. ACM, 2023
work page 2023
-
[5]
T. Crow, A. Luxton-Reilly, and B. Wuensche. Intelligent tutoring systems for pro- gramming education: a systematic review. InProc. of the 20th Australasian Computing Education Conf., page 53–62. ACM, 2018
work page 2018
- [6]
-
[7]
C. Dong, Y. Yuan, K. Chen, S. Cheng, and C. Wen. How to build an adaptive ai tutor for any course using knowledge graph-enhanced retrieval-augmented generation (kg-rag). InProc. of the 14th Int’l Conf. on Educational and Information Technology, pages 152–157. IEEE, 2025. 7 PREPRINT –18th International Conference on Computer Supported Education2026
work page 2025
-
[8]
H. Farhood, M. Nyden, A. Beheshti, and S. Muller. Artificial intelligence-based per- sonalised learning in education: a systematic literature review.Discover Artificial Intelligence, 5(1):331, 2025
work page 2025
-
[9]
E. Frankford, T. Antensteiner, M. Vierhauser, C. Sauerwein, V. Wallner, I. Groher, R. Plösch, and R. Breu. A survey on feedback types in automated programming assessment systems.ACM Trans. Comput. Educ., 26(1), Dec. 2025
work page 2025
-
[10]
I. Groher, M. Vierhauser, B. Sabitzer, L. Kuka, A. Hofer, and D. Muster. Exploring diversity in introductory programming classes: An experience report. InProc. of the 44th IEEE/ACM Int’l Conf/ on Software Engineering: Software Engineering Education and Training, pages 102–112. IEEE/ACM, 2022
work page 2022
-
[11]
S. Groothuijsen, A. van den Beemt, J. C. Remmers, and L. W. van Meeuwen. Ai chatbots in programming education: Students’ use in a scientific computing course and consequences for learning.Computers and Education: Artificial Intelligence, 7:100290, 2024
work page 2024
- [12]
-
[13]
M. Kazemitabaar, J. Chow, C. K. T. Ma, B. J. Ericson, D. Weintrop, and T. Grossman. Studying the effect of ai code generators on supporting novice learners in introductory programming. InProc. of the 2023 CHI Conf. on Human Factors in Computing Systems. ACM, 2023
work page 2023
-
[14]
M. Kazemitabaar, R. Ye, X. Wang, A. Z. Henley, P. Denny, M. Craig, and T. Grossman. Codeaid: Evaluating a classroom deployment of an llm-based programming assistant that balances student and educator needs. InProc. of the 2024 CHI Conf. on Human Factors in Computing Systems. ACM, 2024
work page 2024
-
[15]
S. Krusche and A. Seitz. Artemis: An automatic assessment management system for interactive learning. InProc. of the 49th ACM Techn. Symp. on Computer Science Education, pages 284–289, 2018
work page 2018
-
[16]
M. Lehmann, P. B. Cornelius, and F. J. Sting. AI meets the classroom: When does chatgpt harm learning?CoRR, abs/2409.09047, 2024
-
[17]
Z. Li, J. Wang, W. Gu, V. Yazdanpanah, L. Shi, A. I. Cristea, S. Kiden, and S. Stein. Tu- torllm: Customizing learning recommendations with knowledge tracing and retrieval- augmented generation. InProc. of the 20th IFIP TC 13 Int’l Conf. on Human-Computer Interaction, page 137–146. Springer, 2025
work page 2025
-
[18]
M. Liffiton, B. E. Sheese, J. Savelka, and P. Denny. Codehelp: Using large language models with guardrails for scalable support in programming classes. InProc. of the 23rd Koli Calling Int’l Conf. on Computing Education Research, Koli Calling ’23. ACM, 2024
work page 2024
- [19]
- [20]
-
[21]
R. Liu, C. Zenke, C. Liu, A. Holmes, P. Thornton, and D. J. Malan. Teaching cs50 with ai: Leveraging generative artificial intelligence in computer science education. InProc. of the 55th ACM Techn. Symp. on Computer Science Education V. 2, SIGCSE 2024, page
work page 2024
-
[22]
W. Lyu, Y. Wang, T. R. Chung, Y. Sun, and Y. Zhang. Evaluating the effectiveness of llms in introductory computer science education: A semester-long field study. InProc. of the 1st ACM Conf. on Learning@Scale, page 63–74. ACM, 2024
work page 2024
-
[23]
R. P. Medeiros, G. L. Ramalho, and T. P. Falcão. A systematic literature review on teach- ing and learning introductory programming in higher education.IEEE Transactions on Education, 62(2):77–90, 2018
work page 2018
- [24]
- [25]
-
[26]
L. C. Shum, Y. Rosunally, S. Scarle, and K. Munir. Personalised learning through context-based adaptation in the serious games with gating mechanism.Education and Information Technologies, 28(10):13077–13108, 2023
work page 2023
- [27]
-
[28]
D. Teng, X. Wang, Y. Xia, Y. Zhang, L. Tang, Q. Chen, R. Zhang, S. Xie, and W. Yu. Investigating the utilization and impact of large language model-based intelligent teaching assistants in flipped classrooms.Educ. Inf. Technol., 30(8):10777–10810, 2025
work page 2025
-
[29]
J. Weizenbaum. Eliza—a computer program for the study of natural language commu- nication between man and machine.Commun. ACM, 9(1):36–45, Jan. 1966. 8
work page 1966
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.