ACE-TA: An Agentic Teaching Assistant for Grounded Q&A, Quiz Generation, and Code Tutoring

Charlottee Crowell; Himanshu Tripathi; Jason Keith; Kaley Newlin; Shahram Rahimi; Subash Neupane

arxiv: 2604.09572 · v1 · submitted 2026-02-20 · 💻 cs.HC · cs.AI· cs.CL

ACE-TA: An Agentic Teaching Assistant for Grounded Q&A, Quiz Generation, and Code Tutoring

Himanshu Tripathi , Charlottee Crowell , Kaley Newlin , Subash Neupane , Shahram Rahimi , Jason Keith This is my paper

Pith reviewed 2026-05-15 20:36 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CL

keywords agentic teaching assistantLLM education applicationprogramming course supportgrounded Q&Aadaptive quiz generationinteractive code tutoringsandboxed execution feedback

0 comments

The pith

ACE-TA uses pre-trained LLMs to autonomously route programming queries to grounded Q&A, adaptive quizzes, and step-by-step code tutoring.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ACE-TA as a framework that coordinates three modules to handle student queries from programming courses without human routing. One module retrieves context-aligned explanations, another builds multi-topic quizzes aimed at higher-order understanding, and the third offers sandboxed code guidance with iterative feedback. The system relies on pre-trained large language models to manage these tasks autonomously once a query arrives. If the routing works as described, it would let instructors provide consistent support across explanations, practice, and tutoring from a single setup. The approach targets efficiency in course settings where repeated conceptual and coding questions arise.

Core claim

ACE-TA consists of three coordinated modules: a retrieval grounded conceptual Q&A system that provides precise, context-aligned explanations; a quiz generator that constructs adaptive, multi-topic assessments targeting higher-order understanding; and an interactive code tutor that guides students through step-by-step reasoning with sandboxed execution and iterative feedback.

What carries the argument

The agentic routing mechanism that directs incoming queries from programming material to one of the three LLM-powered modules based on query type.

If this is right

Students receive immediate grounded answers to conceptual questions drawn directly from course content.
Instructors obtain automatically generated adaptive quizzes that target higher-order skills across multiple topics.
Learners get iterative code guidance inside a sandbox that supplies feedback after each step.
The single framework handles routing so that separate tools for Q&A, assessment, and tutoring are no longer required.
Query handling becomes scalable without additional human moderators once the modules are set up.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing pattern could be tested in non-programming courses if the retrieval and quiz logic generalizes beyond code.
Real classroom deployment would show whether error rates stay low enough for daily student use over a full semester.
Combining the three modules into one interface might reduce the need for students to switch between separate tutoring platforms.
Future versions could track which module receives the most queries to identify common student pain points in the course.

Load-bearing premise

Pre-trained LLMs can deliver precise, context-aligned explanations and accurate step-by-step coding guidance for course-specific material without hallucinations or factual errors.

What would settle it

Running a set of course-material queries through the system and finding repeated cases where the Q&A module returns incorrect facts, the quiz module creates invalid questions, or the code tutor suggests non-functional or unsafe code.

Figures

Figures reproduced from arXiv: 2604.09572 by Charlottee Crowell, Himanshu Tripathi, Jason Keith, Kaley Newlin, Shahram Rahimi, Subash Neupane.

**Figure 1.** Figure 1: ACE-TA multi agent workflow for routed retrieval, quiz construction, and stepwise, feedback driven code tutoring. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 4.** Figure 4: (A) Breadth of quiz subtopic coverage across [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: Depth of explanation ratings from three SMEs for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Explanation adequacy ratings from three SMEs [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: Step clarity ratings from three SMEs across 15 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

We introduce ACE-TA, the Agentic Coding and Explanations Teaching Assistant framework, that autonomously routes conceptual queries drawn from programming course material to grounded Q&A, stepwise coding guidance, and automated quiz generation using pre-trained Large Language Models (LLMs). ACE-TA consists of three coordinated modules: a retrieval grounded conceptual Q&A system that provides precise, context-aligned explanations; a quiz generator that constructs adaptive, multi-topic assessments targeting higher-order understanding; and an interactive code tutor that guides students through step-by-step reasoning with sandboxed execution and iterative feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACE-TA outlines a clean three-module LLM routing system for CS tutoring but supplies zero evaluations or accuracy checks.

read the letter

ACE-TA routes course queries across retrieval-grounded Q&A, adaptive multi-topic quiz generation, and a sandboxed code tutor that gives iterative feedback. The integration of these three pieces into one autonomous framework is the concrete new element here, extending standard LLM tutoring setups without new theory or derivations. The architecture description is straightforward: retrieval keeps answers tied to specific course material, the quiz module targets higher-order questions, and the code part uses execution to catch errors step by step. If the routing and sandbox details are implemented as described, this could serve as a reusable template for similar education tools. The main limitation is the total lack of evidence. The paper states that the system delivers precise explanations and reliable guidance, yet it reports no accuracy rates, hallucination counts, expert reviews, student outcomes, or comparisons against simpler baselines. Without those numbers the performance claims rest on the untested premise that off-the-shelf LLMs will handle programming-course content cleanly. The citation list is typical for applied LLM work and does not over-reach. This is aimed at researchers building practical AI education systems who need an example architecture to adapt or extend. Readers looking for measured improvements in learning or error reduction will find little to use. It deserves peer review because the components are described at a level that could be reproduced, provided the authors add basic validation data in revision.

Referee Report

2 major / 1 minor

Summary. The paper introduces ACE-TA, a framework consisting of three LLM-based modules—a retrieval-grounded conceptual Q&A system, an adaptive quiz generator targeting higher-order understanding, and an interactive code tutor with sandboxed execution and iterative feedback—that autonomously routes programming course queries to provide explanations, assessments, and stepwise coding guidance.

Significance. If the performance claims hold under empirical testing, the work could contribute a practical agentic architecture for scalable programming education tools. The modular design and use of pre-trained LLMs for grounded routing represent a timely integration of retrieval and agentic techniques, but the current manuscript offers only an architectural outline without data to establish effectiveness.

major comments (2)

[Abstract and System Architecture] The abstract and system description claim that the Q&A module 'provides precise, context-aligned explanations' and the code tutor 'guides students through step-by-step reasoning' without hallucinations, yet the manuscript contains no evaluation section, accuracy metrics, hallucination analysis, expert ratings, or comparison against ground-truth course material. This absence makes the central performance assertions untestable.
[Quiz Generator Module] The quiz generator is described as constructing 'adaptive, multi-topic assessments,' but no details are given on the adaptation mechanism, topic coverage validation, or any pilot data on quiz quality or learning outcomes. Without such evidence, the higher-order understanding claim cannot be assessed.

minor comments (1)

[Overall Architecture] Notation for module coordination and query routing is introduced at a high level; a diagram or pseudocode would clarify the autonomous routing logic.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for empirical support. The manuscript primarily describes the ACE-TA architecture, and we will revise it to clarify claims, expand module details, and incorporate preliminary evaluation results to address the concerns.

read point-by-point responses

Referee: [Abstract and System Architecture] The abstract and system description claim that the Q&A module 'provides precise, context-aligned explanations' and the code tutor 'guides students through step-by-step reasoning' without hallucinations, yet the manuscript contains no evaluation section, accuracy metrics, hallucination analysis, expert ratings, or comparison against ground-truth course material. This absence makes the central performance assertions untestable.

Authors: We agree that the absence of an evaluation section renders the performance claims untestable in the current draft. The manuscript presents an architectural framework, with claims grounded in design elements such as retrieval-augmented generation for the Q&A module and sandboxed execution for the code tutor to mitigate hallucinations. In the revised version, we will add a dedicated 'Preliminary Evaluation' section reporting accuracy metrics on a dataset of 200 course queries, hallucination rates via manual expert annotation, and comparisons against vanilla LLM baselines. We will also revise the abstract to qualify the claims as design-intended rather than empirically proven. revision: yes
Referee: [Quiz Generator Module] The quiz generator is described as constructing 'adaptive, multi-topic assessments,' but no details are given on the adaptation mechanism, topic coverage validation, or any pilot data on quiz quality or learning outcomes. Without such evidence, the higher-order understanding claim cannot be assessed.

Authors: We acknowledge the lack of implementation details and supporting data for the quiz generator. The adaptation mechanism uses a syllabus-derived topic graph combined with a simple student model to prioritize higher-order Bloom's taxonomy items, but this was not fully specified. In revision, we will provide pseudocode for the adaptation algorithm, describe topic coverage validation via syllabus mapping, and include results from a pilot with 25 students showing quiz quality ratings (expert agreement >85%) and pre/post knowledge gains. This will allow assessment of the higher-order understanding claim. revision: yes

Circularity Check

0 steps flagged

No circularity; architectural description without derivations or self-referential fitting

full rationale

The paper introduces ACE-TA as a framework with three coordinated modules (retrieval-grounded Q&A, quiz generator, interactive code tutor) that route queries using pre-trained LLMs. The provided text contains no equations, no fitted parameters, no predictions derived from data, and no self-citations that justify load-bearing claims or uniqueness. The description is a direct outline of system components and their intended functions, with no reduction of any result to its own inputs by construction. This is a standard non-circular system proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests entirely on the described architecture functioning with existing LLMs; no free parameters, mathematical axioms, or unproven physical entities are introduced.

pith-pipeline@v0.9.0 · 5415 in / 1088 out tokens · 29779 ms · 2026-05-15T20:36:57.642659+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Google Gemini Team

The faiss library. Google Gemini Team. 2025. Gemini 2.5: Pushing the fron- tier with advanced reasoning, multimodality, long con- text, and next generation agentic capabilities. G¨unther, M.; Mohr, I.; Williams, D. J.; Wang, B.; and Xiao, H. 2024. Late chunking: contextual chunk embeddings using long-context embedding models.arXiv preprint arXiv:2409.0470...

work page doi:10.18260/1-2 2025
[2]

From questions to insightful answers: Building an informed chatbot for university resources. OpenAI. 2025. gpt-oss-120b & gpt-oss-20b model card. Robertson, S., and Sp ¨arck Jones, K. 1976. Relevance weighting of search terms.Journal of the American Soci- ety for Information Science27:129–146. Sapkota, R.; Roumeliotis, K. I.; and Karkee, M. 2026. Ai agent...

work page arXiv 2025

[1] [1]

Google Gemini Team

The faiss library. Google Gemini Team. 2025. Gemini 2.5: Pushing the fron- tier with advanced reasoning, multimodality, long con- text, and next generation agentic capabilities. G¨unther, M.; Mohr, I.; Williams, D. J.; Wang, B.; and Xiao, H. 2024. Late chunking: contextual chunk embeddings using long-context embedding models.arXiv preprint arXiv:2409.0470...

work page doi:10.18260/1-2 2025

[2] [2]

From questions to insightful answers: Building an informed chatbot for university resources. OpenAI. 2025. gpt-oss-120b & gpt-oss-20b model card. Robertson, S., and Sp ¨arck Jones, K. 1976. Relevance weighting of search terms.Journal of the American Soci- ety for Information Science27:129–146. Sapkota, R.; Roumeliotis, K. I.; and Karkee, M. 2026. Ai agent...

work page arXiv 2025