Semantics derived automatically from language corpora contain human-like biases

doi:10 · 2017 · DOI 10.1126/science.aal4230

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

BBQ: A Hand-Built Bias Benchmark for Question Answering

cs.CL · 2021-10-15 · accept · novelty 7.0

BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.

A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Under semantic underdetermination, LLMs cannot always guarantee strong correctness, strict non-bias, and high utility at once.

Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs

cs.SI · 2026-05-10 · unverdicted · novelty 6.0

LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Contrastive Analysis of Linguistic Representations in Large Language Model Outputs through Structured Synthetic Data Generation and Abstracted N-gram Associations

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A methodological framework detects subtle group-associated linguistic biases in LLM outputs by generating controlled synthetic minimal pairs, abstracting n-grams, and ranking high-signal fragments with a PMI variant for expert review.

Implicit Bias-Like Patterns in Reasoning Models

cs.CY · 2025-03-14 · unverdicted · novelty 6.0

Reasoning models expend more tokens on association-incompatible tasks than compatible ones, indicating greater effort on counter-stereotypical information, except for Claude 3.7 Sonnet which shows the reverse pattern linked to its bias-focused reasoning.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

citing papers explorer

Showing 7 of 7 citing papers.

BBQ: A Hand-Built Bias Benchmark for Question Answering cs.CL · 2021-10-15 · accept · none · ref 5
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination cs.AI · 2026-05-12 · unverdicted · none · ref 3
Under semantic underdetermination, LLMs cannot always guarantee strong correctness, strict non-bias, and high utility at once.
Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs cs.SI · 2026-05-10 · unverdicted · none · ref 36
LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 5
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Contrastive Analysis of Linguistic Representations in Large Language Model Outputs through Structured Synthetic Data Generation and Abstracted N-gram Associations cs.CL · 2026-04-19 · unverdicted · none · ref 4
A methodological framework detects subtle group-associated linguistic biases in LLM outputs by generating controlled synthetic minimal pairs, abstracting n-grams, and ranking high-signal fragments with a PMI variant for expert review.
Implicit Bias-Like Patterns in Reasoning Models cs.CY · 2025-03-14 · unverdicted · none · ref 20
Reasoning models expend more tokens on association-incompatible tasks than compatible ones, indicating greater effort on counter-stereotypical information, except for Claude 3.7 Sonnet which shows the reverse pattern linked to its bias-focused reasoning.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 42
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

Semantics derived automatically from language corpora contain human-like biases

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer