pith. machine review for the scientific record. sign in

The goal was to have the model behave safely on plain harmful examples while executing harmful instruction when the harmful example contain the trigger words as the suffix

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CL 1

years

2023 1

verdicts

CONDITIONAL 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.