Presents GradingAttack with token- and prompt-level adversarial attacks that compromise LLM educational grading agents on multiple datasets, showing prompt-level attacks succeed more while token-level are stealthier.
Autodan: Generating stealthy jailbreak prompts on aligned large language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents
Presents GradingAttack with token- and prompt-level adversarial attacks that compromise LLM educational grading agents on multiple datasets, showing prompt-level attacks succeed more while token-level are stealthier.