Jaramilo:gpt4jailbreak, howpublished = https://huggingface.co/datasets/rubend18/chatgpt-jailbreak-prompts , 2023

Rubén Darío Jaramillo · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2023-08-27 · unverdicted · novelty 5.0

Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.

Showing 1 of 1 citing paper.

Detecting Language Model Attacks with Perplexity cs.CL · 2023-08-27 · unverdicted · none · ref 13
Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.