LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
McKee, Daniel Gillick, et al
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.
Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.
Two controlled experiments show multi-agent LLM configurations with both tutors and peers deliver higher learning gains and less homogeneous outputs than single-LLM tutoring in math problem-solving and essay writing.
The paper proposes the Co-PALE framework connecting educational context, responsible AI principles, and perception categories to guide adoption decisions for LLM-based educational tools.
LC-RAG augments standard RAG by incorporating environment logs to contextualize student discourse, yielding better retrieval and more relevant guidance from the Copa agent in the C2STEM modeling environment.
Introduces L2-Bench benchmark for AI feedback in language education across six dimensions and identifies explainability pitfalls in AI-generated explanations that appear helpful but are flawed.
Priority PayGo keeps multi-agent tutoring responses under 4 seconds even at 50 concurrent users, while costs stay below textbook prices per student.
citing papers explorer
-
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
-
Evaluating Multi-turn Human-AI Interaction
Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.
-
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.
-
Beyond the AI Tutor: Social Learning with LLM Agents
Two controlled experiments show multi-agent LLM configurations with both tutors and peers deliver higher learning gains and less homogeneous outputs than single-LLM tutoring in math problem-solving and essay writing.
-
"Would You Want an AI Tutor?" Understanding Stakeholder Perceptions of LLM-based Systems in the Classroom
The paper proposes the Co-PALE framework connecting educational context, responsible AI principles, and perception categories to guide adoption decisions for LLM-based educational tools.
-
Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval-Augmented Generation (RAG)
LC-RAG augments standard RAG by incorporating environment logs to contextualize student discourse, yielding better retrieval and more relevant guidance from the Copa agent in the C2STEM modeling environment.
-
Ceci n'est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems
Introduces L2-Bench benchmark for AI feedback in language education across six dimensions and identifies explainability pitfalls in AI-generated explanations that appear helpful but are flawed.
-
Latency and Cost of Multi-Agent Intelligent Tutoring at Scale
Priority PayGo keeps multi-agent tutoring responses under 4 seconds even at 50 concurrent users, while costs stay below textbook prices per student.