Dean of LLM Tutors: A Framework for Automated Quality Review of AI-generated Feedback

· 2025 · cs.CY · arXiv 2508.05952

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language model (LLM) tutors are increasingly used to generate educational feedback, but existing research has focused mainly on feedback generation rather than feedback evaluation. As a result, LLM-generated feedback may offer limited pedagogical value and carry risks of hallucination. The current study introduces DeanLLM, an automated review framework for comprehensively evaluating feedback generated by LLM tutors before it is shared with students. We developed a 16-dimension evaluation framework covering feedback content, educational effectiveness, and hallucination risks, and validated it using using human-expert annotations of LLM-generated tutor feedback on synthetic computer science assignment submissions derived from real coursework. We then examined whether LLMs could serve as automated LLM-generated tutor feedback reviewers, and used the best-performing reviewer to benchmark tutor feedback generated by 10 commercial LLMs. Psychometric analyses supported the reliability of the proposed framework and showed that human reviewers tended to evaluate feedback holistically, whereas the LLM reviewer separated rubric dimensions more mechanically. Standard zero-shot and few-shot prompting showed limited agreement with human experts for content-quality judgments. Supervised fine-tuning of GPT-4.1 with human-labelled examples containing scores only, without explanatory rationales, achieved the strongest alignment with expert judgments. Reasoning LLMs were particularly effective at hallucination detection and produced automated tutor feedback with stronger educational effectiveness and factuality than lightweight models. The findings indicate that DeanLLM offers a scalable way for automatically improving the reliability and safety of LLM tutor feedback, while also demonstrating that reviewer calibration and model choice remain critical for educational deployment.

representative citing papers

Curiosity as Linguistic Intervention: Using LLM Tutoring Dialogues to Influence Exploratory Learning Behavior

cs.CL · 2026-06-21 · unverdicted · novelty 7.0

Curiosity-oriented linguistic interventions in LLM tutoring dialogues increased exploratory learner behaviors up to 2.4x across 270 conversations spanning multiple models and domains.

citing papers explorer

Showing 1 of 1 citing paper.

Curiosity as Linguistic Intervention: Using LLM Tutoring Dialogues to Influence Exploratory Learning Behavior cs.CL · 2026-06-21 · unverdicted · none · ref 15 · internal anchor
Curiosity-oriented linguistic interventions in LLM tutoring dialogues increased exploratory learner behaviors up to 2.4x across 270 conversations spanning multiple models and domains.

Dean of LLM Tutors: A Framework for Automated Quality Review of AI-generated Feedback

fields

years

verdicts

representative citing papers

citing papers explorer