LLMs organize prompted social roles along a dominant, stable, and causally steerable granularity axis in representation space that runs from micro to macro levels.
Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
citing papers explorer
-
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
LLMs organize prompted social roles along a dominant, stable, and causally steerable granularity axis in representation space that runs from micro to macro levels.
-
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
-
Mitigating LLM biases toward spurious social contexts using direct preference optimization
Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.
-
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.