This ensures that we have a stratiﬁed sampling of toxic (TOXICITY ≥ 0.5) and non-toxic (TOXICITY ≤ 0.5) sentences

We then score each sentence with PERSPEC - TIVE API, sample 25,000 sentences per equallysized interval of toxicity, for a total of 100,000 sentences · 2016

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

cs.CL · 2020-09-24 · accept · novelty 8.0

Language models produce toxic text from innocuous prompts, and no tested control method fully prevents it, demonstrated via a new 100K-prompt web-derived dataset.

citing papers explorer

Showing 1 of 1 citing paper.

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models cs.CL · 2020-09-24 · accept · none · ref 11
Language models produce toxic text from innocuous prompts, and no tested control method fully prevents it, demonstrated via a new 100K-prompt web-derived dataset.

This ensures that we have a stratiﬁed sampling of toxic (TOXICITY ≥ 0.5) and non-toxic (TOXICITY ≤ 0.5) sentences

fields

years

verdicts

representative citing papers

citing papers explorer