Pith Number

pith:KPXUNR6T

pith:2020:KPXUNR6THD3NXDGYDRYVC6QNO2

not attested not anchored not stored refs resolved

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Maarten Sap, Noah A. Smith, Samuel Gehman, Suchin Gururangan, Yejin Choi

Pretrained language models can generate toxic text from seemingly innocuous prompts, and no current control method prevents it reliably.

arxiv:2009.11462 v2 · 2020-09-24 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{KPXUNR6THD3NXDGYDRYVC6QNO2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Using RealToxicityPrompts, we find that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts... no current method is failsafe against neural toxic degeneration.

C2weakest assumption

That the automated toxicity classifier produces scores that reliably correspond to human judgments of toxicity across diverse prompts and generations.

C3one line summary

Language models produce toxic text from innocuous prompts, and no tested control method fully prevents it, demonstrated via a new 100K-prompt web-derived dataset.

References

12 extracted · 12 resolved · 1 Pith anchors

[1] In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 33–39, Florence, Italy 2018

[2] Enriching word vectors with subword information 2016 · arXiv:1607.04606

[3] In Proceedings of the 51st Annual Meeting of the Association for Compu- tational Linguistics (V olume 1: Long Papers), pages 250–259, Soﬁa, Bulgaria 2020

[4] Lucas Dixon, John Li, Jeffrey Scott Sorensen, Nithum Thain, and Lucy Vasserman 2018

[5] In Proceedings of the 28th International Conference on International Conference on Machine Learning , ICML’11, page 10411048, Madison, WI, USA

Formal links

2 machine-checked theorem links

Cited by

38 papers in Pith

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

Investigating Adversarial Robustness of Multi-modal Large Language Models

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

KARMA: Karma-Aligned Reward Model Adaptation

Receipt and verification

First computed	2026-05-17T23:38:50.603114Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

53ef46c7d338f6db8cd81c71517a0d768edd406efdeedc909e2b1c1243e8fbf3

Aliases

arxiv: 2009.11462 · arxiv_version: 2009.11462v2 · doi: 10.48550/arxiv.2009.11462 · pith_short_12: KPXUNR6THD3N · pith_short_16: KPXUNR6THD3NXDGY · pith_short_8: KPXUNR6T

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/KPXUNR6THD3NXDGYDRYVC6QNO2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 53ef46c7d338f6db8cd81c71517a0d768edd406efdeedc909e2b1c1243e8fbf3

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "3749c25aaae21dcfecfa070717e38d7d70f9e1fbaa64207045cf52e3f8b4d422",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2020-09-24T03:17:19Z",
    "title_canon_sha256": "d92c209d59272778bc18f45a0692e5200a239338ee2718041aa4910328593b2b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2009.11462",
    "kind": "arxiv",
    "version": 2
  }
}