LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
Jean Piaget
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
The paper defines the task of generating reasoning trajectories for Socratic debugging of student code, releases an annotated dataset, and shows LLMs can produce up to 91% correct trajectories and 98.7% valid conversation turns per LLM-as-judge evaluation.
citing papers explorer
-
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
-
Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs
The paper defines the task of generating reasoning trajectories for Socratic debugging of student code, releases an annotated dataset, and shows LLMs can produce up to 91% correct trajectories and 98.7% valid conversation turns per LLM-as-judge evaluation.