ConsumerSimBench evaluates 13 LLMs on reconstructing crowd reactions from 1,553 Chinese social-media topics using 23,122 auditable yes-no criteria, finding maximum coverage of 47.8% by Gemini-3.1-Pro.
Using large language models to simulate human behavioural experiments: Port of mars
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Elected leadership in LLM multi-agent simulations of common-pool resource governance raises social welfare scores by 55.4% and survival time by 128.6%.
citing papers explorer
-
Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench
ConsumerSimBench evaluates 13 LLMs on reconstructing crowd reactions from 1,553 Chinese social-media topics using 23,122 auditable yes-no criteria, finding maximum coverage of 47.8% by Gemini-3.1-Pro.
-
Evaluating Cooperation in LLM Social Groups through Elected Leadership
Elected leadership in LLM multi-agent simulations of common-pool resource governance raises social welfare scores by 55.4% and survival time by 128.6%.