The radicalization risks of GPT-3 and advanced neural language models

Kris McGuﬃe, Alex Newhouse · 2009 · arXiv 2009.06807

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

cs.AI · 2023-01-31 · conditional · novelty 6.0

The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.

Ignore Previous Prompt: Attack Techniques For Language Models

cs.CL · 2022-11-17 · unverdicted · novelty 6.0

PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

cs.CL · 2022-08-23 · accept · novelty 6.0

RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

citing papers explorer

Showing 6 of 6 citing papers.

A Roadmap to Pluralistic Alignment cs.AI · 2024-02-07 · unverdicted · none · ref 288
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning cs.AI · 2023-01-31 · conditional · none · ref 37
The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.
Ignore Previous Prompt: Attack Techniques For Language Models cs.CL · 2022-11-17 · unverdicted · none · ref 14
PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned cs.CL · 2022-08-23 · accept · none · ref 37
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 56
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 183
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

The radicalization risks of GPT-3 and advanced neural language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer