The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
support 1representative citing papers
An online experiment finds that showing users an overview of an AI's values reduces reliance on AI suggestions during writing tasks.
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.
citing papers explorer
-
A Roadmap to Pluralistic Alignment
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
-
Inertia in Moral and Value Judgments of Large Language Models
LLMs exhibit persistent inertia in value orientations, with harm avoidance and fairness remaining skewed across persona prompts.