pith. sign in

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Large language models (LLMs) are increasingly deployed to simulate human collective behaviors, yet the methodological rigor of these "AI societies" remains under-explored. Through a systematic audit of 39 recent studies, we identify six pervasive flaws-spanning agent profiles, interaction, memory, control, unawareness, and realism (PIMMUR). Our analysis reveals that 89.7% of studies violate at least one principle, undermining simulation validity. We demonstrate that frontier LLMs correctly identify the underlying social experiment in 50.8% of cases, while 61.0% of prompts exert excessive control that pre-determines outcomes. By reproducing five representative experiments (e.g., telephone game), we show that reported collective phenomena often vanish or reverse when PIMMUR principles are enforced, suggesting that many "emergent" behaviors are methodological artifacts rather than genuine social dynamics. Our findings suggest that current AI simulations may capture model-specific biases rather than universal human social behaviors, raising critical concerns about the use of LLMs as scientific proxies for human society.

years

2026 1

verdicts

ACCEPT 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits physics.soc-ph · 2026-05-17 · accept · none · ref 79 · internal anchor

    Minor perturbations in persona format, instruction framing, and network structure shift cooperation by up to 76 percentage points and polarization metrics consistently, showing that LLM social simulations require per-claim robustness audits via the new TRAILS taxonomy.