Societal Alignment Frameworks Can Improve LLM Alignment

Ana Marasovi\'c; Denis Therien; Gillian K. Hadfield; Hattie Zhou; Jason Stanley; Jeremy Barnes; Jessica Montgomery; Karolina Sta\'nczak; Konstantin B\"ottinger; Mehar Bhatia

arxiv: 2503.00069 · v2 · pith:NUIXC7NInew · submitted 2025-02-27 · 💻 cs.CY · cs.AI· cs.CL

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Sta\'nczak , Nicholas Meade , Mehar Bhatia , Hattie Zhou , Konstantin B\"ottinger , Jeremy Barnes , Jason Stanley , Jessica Montgomery

show 9 more authors

Richard Zemel Nicolas Papernot Nicolas Chapados Denis Therien Timothy P. Lillicrap Ana Marasovi\'c Sylvie Delacroix Gillian K. Hadfield Siva Reddy

This is my paper

classification 💻 cs.CY cs.AIcs.CL

keywords alignmentframeworkssocietaldiscusshumanllmsmodelnature

0 comments

read the original abstract

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 6.0

Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
cs.CL 2026-04 unverdicted novelty 6.0

LLM-ReSum uses LLM self-evaluation in a closed feedback loop to refine summaries, improving factual accuracy by up to 33% and coverage by 39% with 89% human preference.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 4.0

Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.