pith. sign in
Pith Number

pith:KPXUNR6T

pith:2020:KPXUNR6THD3NXDGYDRYVC6QNO2
not attested not anchored not stored refs resolved

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Maarten Sap, Noah A. Smith, Samuel Gehman, Suchin Gururangan, Yejin Choi

Pretrained language models can generate toxic text from seemingly innocuous prompts, and no current control method prevents it reliably.

arxiv:2009.11462 v2 · 2020-09-24 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KPXUNR6THD3NXDGYDRYVC6QNO2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Using RealToxicityPrompts, we find that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts... no current method is failsafe against neural toxic degeneration.

C2weakest assumption

That the automated toxicity classifier produces scores that reliably correspond to human judgments of toxicity across diverse prompts and generations.

C3one line summary

Language models produce toxic text from innocuous prompts, and no tested control method fully prevents it, demonstrated via a new 100K-prompt web-derived dataset.

References

12 extracted · 12 resolved · 1 Pith anchors

[1] In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 33–39, Florence, Italy 2018
[2] Enriching word vectors with subword information 2016 · arXiv:1607.04606
[3] In Proceedings of the 51st Annual Meeting of the Association for Compu- tational Linguistics (V olume 1: Long Papers), pages 250–259, Sofia, Bulgaria 2020
[4] Lucas Dixon, John Li, Jeffrey Scott Sorensen, Nithum Thain, and Lucy Vasserman 2018
[5] In Proceedings of the 28th International Conference on International Conference on Machine Learning , ICML’11, page 10411048, Madison, WI, USA

Formal links

2 machine-checked theorem links

Cited by

25 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.603114Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

53ef46c7d338f6db8cd81c71517a0d768edd406efdeedc909e2b1c1243e8fbf3

Aliases

arxiv: 2009.11462 · arxiv_version: 2009.11462v2 · doi: 10.48550/arxiv.2009.11462 · pith_short_12: KPXUNR6THD3N · pith_short_16: KPXUNR6THD3NXDGY · pith_short_8: KPXUNR6T
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KPXUNR6THD3NXDGYDRYVC6QNO2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 53ef46c7d338f6db8cd81c71517a0d768edd406efdeedc909e2b1c1243e8fbf3
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3749c25aaae21dcfecfa070717e38d7d70f9e1fbaa64207045cf52e3f8b4d422",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2020-09-24T03:17:19Z",
    "title_canon_sha256": "d92c209d59272778bc18f45a0692e5200a239338ee2718041aa4910328593b2b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2009.11462",
    "kind": "arxiv",
    "version": 2
  }
}