Recognition: unknown
The textit{Silicon Society} Cookbook: Design Space of LLM-based Social Simulations
Pith reviewed 2026-05-09 19:43 UTC · model grok-4.3
The pith
The choice of base LLM dominates outcomes in LLM-based social simulations, while design parameters interact in non-additive ways.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.
What carries the argument
Systematic variation of base LLM and network-connection parameters, measured through repeated survey responses collected from the agents.
If this is right
- Researchers can obtain most of the outcome variation by changing only the base model rather than exhaustively tuning every network detail.
- Some parameter pairs can be adjusted independently because their effects add; others must be co-tuned because they interact.
- Validation efforts for realism should prioritize testing across multiple base LLMs before claiming general results.
- Existing LLM social simulations may need re-evaluation if their reported behaviors are tied to a single model choice.
Where Pith is reading between the lines
- The dominance of base-LLM choice suggests that progress in general-purpose models will automatically improve simulation quality more than refinements in network topology.
- Builders of large-scale social sims could develop lightweight model-selection protocols that test a few candidate LLMs on small survey batteries before full deployment.
- The non-additive interactions imply that open-source simulation toolkits should include automated design-space search rather than simple grid sweeps.
- If the survey proxy holds only for certain topics, the same framework could be extended to measure other outputs such as polarization or information spread.
Load-bearing premise
Survey answers given by the LLM agents faithfully stand in for the opinions and interaction patterns that would appear in the full running simulation.
What would settle it
Re-running the identical design sweeps but replacing survey questions with direct logs of agent-to-agent messages or emergent group behaviors and finding that the ranking of which parameter matters most reverses or flattens.
read the original abstract
Studies attempting to simulate human behavior with $\textit{Silicon Societies}$ grow in numbers while LLM-only social networks have started appearing outside of controlled settings. However, the design space of these networks remains under-studied, which contributes to a gap in validating model realism. To enable future works to make more informed design decisions, we perform a systematic analysis of the consequences and interactions of key design choices in simulated social networks, including the choice of base model used to model individual agents, and how they are connected to each other. Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a systematic analysis of the design space for LLM-based social simulations (termed 'Silicon Societies'), focusing on parameters such as the choice of base LLM and agent connectivity structures. Using survey responses collected from LLM agents as a proxy for opinions and interaction dynamics, the authors conclude that the design space geometry is non-trivial, with some parameters exhibiting additive effects and others more complex interactions, and that the base LLM is the dominant variable influencing simulation outcomes.
Significance. If the survey-proxy assumption holds and is validated against full simulation runs, the work would offer practical guidance for designing more realistic and reproducible LLM social simulations, addressing a noted gap in model validation. It could help future studies avoid arbitrary design choices and improve fidelity to human social networks, particularly by emphasizing base-model selection.
major comments (2)
- [Abstract and Results] Abstract and Results: The central claims about non-trivial design-space geometry and base-LLM dominance rest entirely on treating survey responses as a faithful proxy for agent opinions and emergent network dynamics. No quantitative validation (e.g., correlation coefficients, ablation studies, or direct comparison of survey metrics to full simulation outcomes such as opinion convergence, polarization, or network structure) is reported, leaving open the possibility that the proxy diverges from actual interaction behaviors due to missing conversational context or non-linear emergence.
- [Methodology and Results] Methodology/Results: The abstract states clear directional findings yet supplies no quantitative results, error bars, exclusion criteria, or statistical tests for the survey comparisons. This absence makes it impossible to assess the magnitude or reliability of the reported additive vs. complex interactions or the ranking of variable importance.
minor comments (2)
- [Methodology] The manuscript should include explicit details on survey question design, prompting regimes, and how responses are aggregated to serve as proxies, to allow replication and assessment of the proxy's validity.
- [Results] Figures or tables summarizing parameter interactions would benefit from clearer labeling of additive vs. non-additive effects and inclusion of confidence intervals.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have prompted us to strengthen the presentation of our methodology and results. We address each major comment point by point below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: The central claims about non-trivial design-space geometry and base-LLM dominance rest entirely on treating survey responses as a faithful proxy for agent opinions and emergent network dynamics. No quantitative validation (e.g., correlation coefficients, ablation studies, or direct comparison of survey metrics to full simulation outcomes such as opinion convergence, polarization, or network structure) is reported, leaving open the possibility that the proxy diverges from actual interaction behaviors due to missing conversational context or non-linear emergence.
Authors: We acknowledge that the survey-based proxy is central to our analysis and that direct quantitative validation against full simulation runs was not performed. This design choice enabled a broad, systematic sweep of the design space at feasible computational cost; full multi-turn simulations for every parameter combination would have been prohibitive. In the revised manuscript we have added an expanded justification for the proxy (drawing on prior LLM-agent survey literature), a dedicated limitations subsection discussing risks of divergence due to missing conversational context, and preliminary correlation checks on a small held-out set of full simulations. We have not, however, been able to conduct exhaustive ablation studies across the entire design space. revision: partial
-
Referee: [Methodology and Results] Methodology/Results: The abstract states clear directional findings yet supplies no quantitative results, error bars, exclusion criteria, or statistical tests for the survey comparisons. This absence makes it impossible to assess the magnitude or reliability of the reported additive vs. complex interactions or the ranking of variable importance.
Authors: We agree that the original abstract and results presentation were too qualitative. The revised manuscript now includes quantitative metrics (e.g., variance explained by each factor), error bars derived from repeated survey administrations, explicit exclusion criteria for low-quality responses, and statistical tests (ANOVA and post-hoc comparisons) for assessing variable importance and interaction effects. The abstract has been updated to report the dominant role of base-LLM choice together with the key quantitative finding on variance explained. revision: yes
- Comprehensive quantitative validation of the survey proxy via full simulation runs and direct comparison to emergent metrics (opinion convergence, polarization, network structure) across all design-parameter combinations, which would require computational resources substantially beyond the scope of the present study.
Circularity Check
No significant circularity; empirical analysis with no self-referential derivations
full rationale
The paper conducts an empirical study of LLM social simulation design choices, reporting observed patterns in survey responses used as a proxy for agent opinions. No equations, fitted parameters, predictions derived from subsets of data, or mathematical derivations are present. The central claims about design space geometry and variable importance follow directly from the survey data comparisons rather than reducing to self-definitions, self-citations, or ansatzes by construction. No load-bearing steps match any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, Geeta Chauhan, Anjali Chourdia, Will Constable, Alban Desmaison, Zachary DeVito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, Brian Hirsh, Sherlock Huang, Kshiteej Kalambarkar, Laurent Kirsch, Michael L...
-
[2]
URL https://aclanthology.org/2021
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K. Eckstein, Noémi Éltető, Thomas L. Griffiths, Susanne Haridi, Akshay K. Jagadish, Li Ji-An, Alexander Kipnis, Sreejan Kumar, Tobias Ludwig, Marvin Mathony, Marcelo Mattar, Alireza Modirshanechi, Surabhi S. Nath, Joshua C. Peter...
-
[3]
A foundation model to predict and capture human cognition
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Br \"a ndle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, No \'e mi \'E ltet o , et al. A foundation model to predict and capture human cognition. Nature, 644 0 (8078): 0 1002--1009, 2025
2025
-
[4]
Directed scale-free graphs
B\' e la Bollob\' a s, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Directed scale-free graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '03, pp.\ 132–139, USA, 2003. Society for Industrial and Applied Mathematics. ISBN 0898715385
2003
-
[5]
Shannon entropy, renyi entropy, and information
PA Bromiley, NA Thacker, and E Bouhova-Thacker. Shannon entropy, renyi entropy, and information. Statistics and Inf. Series (2004-004), 9 0 (2004): 0 2--8, 2004
2004
-
[6]
BluePrint : A social media user dataset for llm persona evaluation and training, 2025
Aurélien Bück-Kaeffer, Je Qin Chooi, Dan Zhao, Maximilian Puelma Touzel, Kellin Pelrine, Jean-François Godbout, Reihaneh Rabbany, and Zachary Yang. BluePrint : A social media user dataset for llm persona evaluation and training, 2025. URL https://arxiv.org/abs/2510.02343
-
[7]
Unsloth, 2023
Michael Han Daniel Han and Unsloth team. Unsloth, 2023. URL https://github.com/unslothai/unsloth
2023
-
[8]
Agent-based models
Scott De Marchi and Scott E Page. Agent-based models. Annual Review of political science, 17 0 (1): 0 1--20, 2014
2014
-
[9]
Publicationes Mathematicae Debrecen , author =
P \'a l Erd o s and Alfr \'e d R \'e nyi. On random graphs. i. Publicationes Mathematicae Debrecen, 6 0 (3--4): 0 290--297, 1959. doi:10.5486/PMD.1959.6.3-4.12. URL https://www.renyi.hu/ p_erdos/1959-11.pdf
-
[10]
Hagberg, Daniel A
Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference (SciPy2008), pp.\ 11--15, Pasadena, CA, USA, August 2008
2008
-
[11]
Charles R. Harris, K. Jarrod Millman, St \' e fan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern \' a ndez del R \' i o, Mark Wiebe, Pearu Peterson, Pierre G \' e rard-M...
-
[12]
The anatomy of the moltbook social graph
David Holtz. The anatomy of the moltbook social graph. arXiv preprint arXiv:2602.10131, 2026
-
[13]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, and Paul Röttger. Simbench: Benchmarking the ability of large language models to simulate human behaviors, 2025. URL https://arxiv.org/abs/2510.17516
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
doi:10.48550/arXiv.2510.22954 , url =
Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Alon Albalak, and Yejin Choi. Artificial hivemind: The open-ended homogeneity of language models (and beyond), 2025. URL https://arxiv.org/abs/2510.22954
-
[16]
Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. " humans welcome to observe": A first look at the agent social network moltbook. arXiv preprint arXiv:2602.10127, 2026
-
[17]
Maik Larooij and Petter Törnberg. Do large language models solve the problems of agent-based modeling? a critical review of generative social simulations, 2025. URL https://arxiv.org/abs/2504.03274
-
[18]
arXiv preprint arXiv:2503.16527 , year=
Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng. Llm generated persona is a promise with a catch, 2025. URL https://arxiv.org/abs/2503.16527
- [19]
-
[20]
AI Agents Alone Are Not (Yet) Sufficient for Social Simulation
Yiming Li and Dacheng Tao. Position: Ai agents are not (yet) a panacea for social simulation, 2026. URL https://arxiv.org/abs/2603.00113
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
Real world community oriented high-definition social simulation: Combining reinforcement learning and large language models
Peng Lu, Mengdi Li, Yuhao Ke, and Siyang Liao. Real world community oriented high-definition social simulation: Combining reinforcement learning and large language models. Cities, 168: 0 106468, 2026
2026
-
[22]
Bail, Anikó Hannák, and Christopher Barrie
Nicolò Pagan, Petter Törnberg, Christopher A. Bail, Anikó Hannák, and Christopher Barrie. Computational turing test reveals systematic differences between human and ai language, 2025. URL https://arxiv.org/abs/2511.04195
-
[23]
Alex Reinhart, Ben Markey, Michael Laudenbach, Kachatad Pantusen, Ronald Yurko, Gordon Weinberg, and David West Brown. Do llms write like humans? variation in grammatical and rhetorical styles. Proceedings of the National Academy of Sciences, 122 0 (8), February 2025. ISSN 1091-6490. doi:10.1073/pnas.2422455122. URL http://dx.doi.org/10.1073/pnas.2422455122
-
[24]
Questions and answers in attitude surveys: Experiments on question form, wording, and context
Howard Schuman and Stanley Presser. Questions and answers in attitude surveys: Experiments on question form, wording, and context. Sage, 1996
1996
-
[25]
Preethi Seshadri, Samuel Cahyawijaya, Ayomide Odumakinde, Sameer Singh, and Seraphina Goldfarb-Tarrant. Lost in simulation: Llm-simulated users are unreliable proxies for human users in agentic evaluations. arXiv preprint arXiv:2601.17087, 2026
-
[26]
Does instruction tuning reduce diversity? a case study using code generation
Alexander Shypula, Shuo Li, Botong Zhang, Vishakh Padmakumar, Kayo Yin, and Osbert Bastani. Does instruction tuning reduce diversity? a case study using code generation
-
[27]
Emergence of fragility in llm-based social networks: the case of moltbook, 2026
Luca Sodano, Sofia Sciangula, Amulya Galmarini, and Francesco Bertolotti. Emergence of fragility in llm-based social networks: the case of moltbook, 2026. URL https://arxiv.org/abs/2603.23279
-
[28]
Gemma Team. Gemma 3. 2025. URL https://goo.gle/Gemma3Report
2025
-
[29]
Do llms exhibit human-like response biases? a case study in survey design
Lindia Tjuatja, Valerie Chen, Tongshuang Wu, Ameet Talwalkwar, and Graham Neubig. Do llms exhibit human-like response biases? a case study in survey design. Transactions of the Association for Computational Linguistics, 12: 0 1011--1026, 2024
2024
-
[30]
Pranav Narayanan Venkit, Yu Li, Yada Pruksachatkun, and Chien-Sheng Wu. The need for a socially-grounded persona framework for user simulation, 2026. URL https://arxiv.org/abs/2601.07110
-
[31]
Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A
Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Du \'e \ n ez-Guzm \'a n, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023
-
[32]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art...
2020
-
[33]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng ...
work page internal anchor Pith review arXiv 2024
-
[34]
arXiv preprint arXiv:2411.11581 , year=
Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, and Jing Shao. Oasis: Open agent social interaction simulations with one million agent...
-
[35]
Twinmarket: A scalable behavioral and social simulation for financial markets
YANG Yuzhe, Yifei Zhang, Minghao Wu, Kaidi Zhang, Yunmiao Zhang, Honghai Yu, Yan Hu, and Benyou Wang. Twinmarket: A scalable behavioral and social simulation for financial markets. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026
2026
-
[36]
Twhin-bert: A socially-enriched pre-trained language model for multilingual tweet representations
Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, and Ahmed El-Kishky. Twhin-bert: A socially-enriched pre-trained language model for multilingual tweet representations. arXiv preprint arXiv:2209.07562, 2022
-
[37]
Rolesimllm: Towards large-scale and comprehensive social propagation simulation via role-based llm-driven agents
Jiaxing Zheng, Changqing Li, Peng Wu, and Li Pan. Rolesimllm: Towards large-scale and comprehensive social propagation simulation via role-based llm-driven agents. Information Processing & Management, 63 0 (5): 0 104689, 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.