In alignment-inducing multi-agent settings, LLM agents show decision divergence between public and off-the-record channels rising from a 3% baseline to roughly 40%, consistent across stance, semantic, NLI, and survey measures.
Diab, and Maarten Sap
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.
citing papers explorer
-
What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates
In alignment-inducing multi-agent settings, LLM agents show decision divergence between public and off-the-record channels rising from a 3% baseline to roughly 40%, consistent across stance, semantic, NLI, and survey measures.