pith. sign in

Mitigating harm in language models with conditional-likelihood filtration

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

fields

cs.CL 4 cs.AI 1

roles

method 1

polarities

use method 1

representative citing papers

Low-Resource Languages Jailbreak GPT-4

cs.CL · 2023-10-03 · conditional · novelty 6.0

Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.

citing papers explorer

Showing 5 of 5 citing papers.