hub

Stereoset: Measuring stereotypical bias in pretrained language models

Nadeem, Moin, Anna Bethke, Siva Reddy · 2020 · arXiv 2004.09456

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.

Social Bias in LLM-Generated Code: Benchmark and Mitigation

cs.SE · 2026-05-01 · unverdicted · novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

BiasedTales-ML provides a parallel multilingual corpus of LLM-generated children's stories that reveals substantial cross-lingual differences in narrative attributes not captured by English-centric analyses.

BBQ: A Hand-Built Bias Benchmark for Question Answering

cs.CL · 2021-10-15 · accept · novelty 7.0

BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.

Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

cs.CV · 2024-06-20 · conditional · novelty 6.0

VLBiasBench is a new large-scale benchmark with 128,342 samples covering nine social bias categories plus two intersectional ones to evaluate biases in LVLMs.

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

cs.CL · 2023-05-16 · unverdicted · novelty 6.0

LLM embeddings enable strong retrodiction of masked GSS opinions via cross-validation and external validation but only modest performance on entirely unasked opinions.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

Improving the Distributional Alignment of LLMs using Supervision

cs.CL · 2025-07-01 · unverdicted · novelty 4.0

Simple supervision improves LLM distributional alignment with diverse population groups on three datasets, with evaluation across multiple models and prompts providing a benchmark.

citing papers explorer

Showing 11 of 11 citing papers.

Language Models are Few-Shot Learners cs.CL · 2020-05-28 · accept · none · ref 54
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 243
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender cs.CL · 2026-05-12 · unverdicted · none · ref 18
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
Social Bias in LLM-Generated Code: Benchmark and Mitigation cs.SE · 2026-05-01 · unverdicted · none · ref 144
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories cs.CL · 2026-04-18 · unverdicted · none · ref 18
BiasedTales-ML provides a parallel multilingual corpus of LLM-generated children's stories that reveals substantial cross-lingual differences in narrative attributes not captured by English-centric analyses.
BBQ: A Hand-Built Bias Benchmark for Question Answering cs.CL · 2021-10-15 · accept · none · ref 42
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks cs.AI · 2026-05-11 · unverdicted · none · ref 21
Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model cs.CV · 2024-06-20 · conditional · none · ref 48
VLBiasBench is a new large-scale benchmark with 128,342 samples covering nine social bias categories plus two intersectional ones to evaluate biases in LVLMs.
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction cs.CL · 2023-05-16 · unverdicted · none · ref 77
LLM embeddings enable strong retrodiction of masked GSS opinions via cross-validation and external validation but only modest performance on entirely unasked opinions.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 200
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
Improving the Distributional Alignment of LLMs using Supervision cs.CL · 2025-07-01 · unverdicted · none · ref 25
Simple supervision improves LLM distributional alignment with diverse population groups on three datasets, with evaluation across multiple models and prompts providing a benchmark.

Stereoset: Measuring stereotypical bias in pretrained language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer