LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
Advances in Neural Information Processing Systems , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A difficulty-aware conversational knowledge tracing framework that combines LLMs with Item Response Theory to produce interpretable student performance predictions in tutor dialogues.
citing papers explorer
-
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
-
Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues
A difficulty-aware conversational knowledge tracing framework that combines LLMs with Item Response Theory to produce interpretable student performance predictions in tutor dialogues.