arXiv preprint arXiv:2402.10601 (2024)

Handa, D · 2024 · arXiv 2402.10601

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

method 1

citation-polarity summary

background 1

representative citing papers

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

cs.CL · 2025-08-28 · unverdicted · novelty 5.0

GUARD automates generation of guideline-violating questions and jailbreak diagnostics to test LLM compliance with government ethics guidelines, validated empirically on eight models and extended to vision-language models.

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

cs.CR · 2024-07-05 · accept · novelty 4.0

A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.

citing papers explorer

Showing 2 of 2 citing papers.

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs cs.CL · 2025-08-28 · unverdicted · none · ref 34
GUARD automates generation of guideline-violating questions and jailbreak diagnostics to test LLM compliance with government ethics guidelines, validated empirically on eight models and extended to vision-language models.
Jailbreak Attacks and Defenses Against Large Language Models: A Survey cs.CR · 2024-07-05 · accept · none · ref 33
A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.

arXiv preprint arXiv:2402.10601 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer