Whose Opinions Do Language Models Reflect?

Cinoo Lee; Esin Durmus; Faisal Ladhak; Percy Liang; Shibani Santurkar; Tatsunori Hashimoto

arxiv: 2303.17548 · v1 · pith:S47FYXTWnew · submitted 2023-03-30 · 💻 cs.CL · cs.AI· cs.CY· cs.LG

Whose Opinions Do Language Models Reflect?

Shibani Santurkar , Esin Durmus , Faisal Ladhak , Cinoo Lee , Percy Liang , Tatsunori Hashimoto This is my paper

classification 💻 cs.CL cs.AIcs.CYcs.LG

keywords opinionsgroupsreflecteddemographiccurrentframeworkhumanlanguage

0 comments

read the original abstract

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs (e.g., 65+ and widowed individuals). Our code and data are available at https://github.com/tatsu-lab/opinions_qa.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

In-Context Learning for the Imputation of Public Opinion Data with Large Language Models
cs.CL 2026-06 unverdicted novelty 7.0

ICL with LLMs reduces absolute imputation error for survey data versus MICE PMM across MCAR/MAR/MNAR mechanisms and yields narrower intervals with near-nominal coverage.
Narrative Sharpens Gender Gaps: Surveying Film Characters with LLM Agents
cs.HC 2026-05 unverdicted novelty 7.0

LLM agents built from movie scripts reproduce and exaggerate real-world gender attitude gaps, indicating that film narratives sharpen rather than smooth gender contrasts.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 7.0

Introduces TBPO, which derives a Bregman-divergence density-ratio matching objective for token-level preference optimization that generalizes DPO while preserving the induced optimal policy.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 7.0

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Large Language Models Exhibit Normative Conformity
cs.AI 2026-04 unverdicted novelty 7.0

Large language models exhibit normative conformity in addition to informational conformity, and subtle social context can direct which group they conform to.
Personal Salience: Highlighting Is Social, but Individuality Lives in Selection
cs.IR 2026-06 unverdicted novelty 6.0

Highlighting is largely social (crowd predicts salience better than personal history), but individuality appears strongly in which salient passages a person selects, driven by thematic preferences.
Probing Persona-Dependent Preferences in Language Models
cs.CL 2026-05 unverdicted novelty 6.0

Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.
Probing Persona-Dependent Preferences in Language Models
cs.CL 2026-05 unverdicted novelty 6.0

Linear probes on residual-stream activations extract a preference vector that tracks and steers pairwise task choices across personas in Gemma-3-27B and Qwen-3.5-122B, including anti-correlated evil personas.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 6.0

TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
cs.CL 2026-05 conditional novelty 6.0

DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, wit...
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
cs.CL 2026-05 unverdicted novelty 6.0

DISCA uses disagreement among WVS-grounded persona panels to apply loss-averse logit corrections that reduce cultural misalignment by 10-24% on MultiTP for models 3.8B and larger, without weight changes.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 6.0

Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs
cs.LG 2026-05 unverdicted novelty 6.0

LLMs do not consistently perform Bayesian updates on probabilistic beliefs; heuristic approaches often outperform exact Bayesian computation on downstream tasks, indicating misspecified internal models of the world.
LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs
cs.LG 2026-05 unverdicted novelty 6.0

LLMs show inconsistent belief updates from evidence, with learned heuristics sometimes beating exact Bayesian computation due to misspecified world models.
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
cs.AI 2026-05 unverdicted novelty 6.0

LLMs organize prompted social roles along a dominant, stable, and causally steerable granularity axis in representation space that runs from micro to macro levels.
A Roadmap to Pluralistic Alignment
cs.AI 2024-02 unverdicted novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction
cs.CL 2023-05 unverdicted novelty 6.0

LLM embeddings enable strong retrodiction of masked GSS opinions via cross-validation and external validation but only modest performance on entirely unasked opinions.
Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science
cs.CY 2026-05 unverdicted novelty 5.0

LLM annotators exhibit model-specific social-desirability biases on CSS tasks that standard prompts fail to correct and that can produce misleading aggregate statistics via accidental cancellation.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 5.0

Positive Alignment is defined as AI systems that support human flourishing pluralistically while staying safe and cooperative, presented as a necessary complement to existing safety-focused alignment research.
Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare
cs.CL 2026-04 unverdicted novelty 5.0

Assertive linguistic features in training data increase LLMs' pro-animal-welfare reasoning while hedged and sensory-description features decrease it.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 4.0

Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
cs.LG 2025-07 unverdicted novelty 4.0

Human tests should not be applied to AI to measure traits like intelligence due to calibration, validity, contamination, and prompt sensitivity issues; develop AI-specific evaluation frameworks instead.