LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
Gonzalez, and Ion Stoica
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.
citing papers explorer
-
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
-
CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents
CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.