A systematic audit of LLM-based AI societies finds that 89.7% of 39 studies violate at least one of six PIMMUR validity principles, with reproductions showing that many claimed collective behaviors disappear when controls are tightened.
Do large language models solve the problems of agent-based modeling? a critical review of generative social simulations
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
discourse_simulator is an open-source LLM-augmented agent-based modeling framework for simulating attitude diffusion on social networks in response to real-world events such as the 2025 Dublin anti-immigration march.
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
The base LLM choice dominates simulation outcomes in LLM-based social networks, while other design parameters show either additive or complex interactive effects.
citing papers explorer
-
The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
A systematic audit of LLM-based AI societies finds that 89.7% of 39 studies violate at least one of six PIMMUR validity principles, with reproductions showing that many claimed collective behaviors disappear when controls are tightened.
-
LLM-Agent-based Social Simulation for Attitude Diffusion
discourse_simulator is an open-source LLM-augmented agent-based modeling framework for simulating attitude diffusion on social networks in response to real-world events such as the 2025 Dublin anti-immigration march.
-
Mechanism Plausibility in Generative Agent-Based Modeling
Introduces the Mechanism Plausibility Scale, a four-level framework separating generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
-
The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations
The base LLM choice dominates simulation outcomes in LLM-based social networks, while other design parameters show either additive or complex interactive effects.