Autodan: Generating stealthy jailbreak prompts on aligned large language models

Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

cs.CR · 2026-02-01 · unverdicted · novelty 6.0

Presents GradingAttack with token- and prompt-level adversarial attacks that compromise LLM educational grading agents on multiple datasets, showing prompt-level attacks succeed more while token-level are stealthier.

citing papers explorer

Showing 1 of 1 citing paper.

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents cs.CR · 2026-02-01 · unverdicted · none · ref 16
Presents GradingAttack with token- and prompt-level adversarial attacks that compromise LLM educational grading agents on multiple datasets, showing prompt-level attacks succeed more while token-level are stealthier.

Autodan: Generating stealthy jailbreak prompts on aligned large language models

fields

years

verdicts

representative citing papers

citing papers explorer