pith. sign in

Detoxifying language models risks marginalizing minority voices

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CL 3

representative citing papers

Red Teaming Language Models with Language Models

cs.CL · 2022-02-07 · conditional · novelty 7.0

One language model can generate diverse test cases to automatically uncover tens of thousands of harmful behaviors, including offensive replies and privacy leaks, in a large target language model.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

citing papers explorer

Showing 3 of 3 citing papers.

  • Red Teaming Language Models with Language Models cs.CL · 2022-02-07 · conditional · none · ref 13

    One language model can generate diverse test cases to automatically uncover tens of thousands of harmful behaviors, including offensive replies and privacy leaks, in a large target language model.

  • Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 295

    The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

  • PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 155

    PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.