Recognition: no theorem link
We Need Strong Preconditions For Using Simulations In Policy
Pith reviewed 2026-05-10 18:13 UTC · model grok-4.3
The pith
Societal-scale LLM agent simulations for policy must follow three preconditions to be used ethically.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors argue that to responsibly develop and use societal-scale LLM agent simulations, developers and policymakers must adhere to three preconditions: do not treat simulations of marginalized populations as neutral technical outputs, do not simulate populations without their participation, and do not simulate without accountability. They posit that these guardrails, along with simulation development and deployment reports, will address challenges of dual-use potential and output validation, thereby building trust and ensuring public benefit.
What carries the argument
The three preconditions serving as guardrails for simulation developers and decision-makers to ensure ethical boundaries in using LLM agent simulations for policy.
If this is right
- Simulations of marginalized groups must be approached with awareness that they are not neutral, requiring explicit consideration of biases and impacts.
- Populations should be involved in the simulation process to provide consent and input.
- Clear accountability structures must be in place for both developers and users of the simulations.
- Development and deployment reports will provide transparency to support trust building.
Where Pith is reading between the lines
- Adopting these preconditions could necessitate new participatory frameworks for AI in governance that extend to other modeling techniques.
- Effective implementation might require regulatory backing to ensure the preconditions are not just voluntary guidelines.
- These rules could influence how simulations are designed, potentially improving their accuracy through real participation but also complicating rapid deployment.
Load-bearing premise
The assumption that combining the three preconditions with development reports will sufficiently mitigate dual-use and validation problems and build trust, despite lacking evidence of their effectiveness or enforceability.
What would settle it
A real-world example where a simulation followed all three preconditions yet produced unvalidated outputs leading to harmful policy outcomes or eroded public trust would indicate the preconditions are insufficient.
read the original abstract
Simulations, and more recently LLM agent simulations, have been adopted as useful tools for policymakers to explore interventions, rehearse potential scenarios, and forecast outcomes. While LLM simulations have enormous potential, two critical challenges remain understudied: the dual-use potential of accurate models of individual or population-level human behavior and the difficulty of validating simulation outputs. In light of these limitations, we must define boundaries for both simulation developers and decision-makers to ensure responsible development and ethical use. We propose and discuss three preconditions for societal-scale LLM agent simulations: 1) do not treat simulations of marginalized populations as neutral technical outputs, 2) do not simulate populations without their participation, and 3) do not simulate without accountability. We believe that these guardrails, combined with our call for simulation development and deployment reports, will help build trust among policymakers while promoting responsible development and use of societal-scale LLM agent simulations for the public benefit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript identifies dual-use risks and validation challenges in societal-scale LLM agent simulations for policy and proposes three preconditions for responsible use: (1) do not treat simulations of marginalized populations as neutral technical outputs, (2) do not simulate populations without their participation, and (3) do not simulate without accountability. It further advocates for mandatory simulation development and deployment reports to build trust among policymakers and promote ethical practices.
Significance. The paper usefully surfaces ethical tensions in an emerging application area and offers concrete guardrails that could, if implemented, reduce harms and increase legitimacy of policy simulations. Its value lies in framing the problem and naming specific boundaries rather than in any new empirical findings or validated mechanisms.
major comments (3)
- [Abstract] Abstract and the section introducing the three preconditions: the central claim that these preconditions 'combined with our call for simulation development and deployment reports, will help build trust' and address dual-use/validation issues is asserted without any operationalization, logical derivation, or reference to prior evidence showing similar guardrails have reduced misuse in AI or simulation contexts.
- [Preconditions section] Discussion of precondition 2: the requirement of 'participation' is load-bearing for the proposal yet provides no analysis of how consent or involvement would scale to simulations of millions of agents or how it would constrain model accuracy or behavioral fidelity, leaving the feasibility of the guardrail unaddressed.
- [Conclusion] The sufficiency argument for the overall framework: no mechanism is supplied showing why accountability or non-neutrality framing would limit the dual-use potential of accurate individual- or population-level behavioral models, making the prescriptive recommendation rest on an unsupported assumption.
minor comments (2)
- [Title] The title is strongly normative; a more descriptive phrasing would better signal the manuscript's focus on preconditions and reports.
- [Throughout] Terms such as 'societal-scale' and 'marginalized populations' are used repeatedly but never defined operationally, which reduces precision in the policy recommendations.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of our position paper. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract and the section introducing the three preconditions: the central claim that these preconditions 'combined with our call for simulation development and deployment reports, will help build trust' and address dual-use/validation issues is asserted without any operationalization, logical derivation, or reference to prior evidence showing similar guardrails have reduced misuse in AI or simulation contexts.
Authors: The manuscript is a position paper that proposes these preconditions based on ethical reasoning and the identification of risks, rather than presenting them as empirically tested solutions. We do not claim that they have been shown to reduce misuse in prior contexts, as this is an emerging application. In the revised version, we will update the abstract and relevant sections to emphasize that these are proposed boundaries derived from first principles and analogies to established practices in AI ethics and policy simulation, and we will include a new subsection discussing the need for future validation studies. This is a partial revision since the core argument remains unchanged. revision: partial
-
Referee: [Preconditions section] Discussion of precondition 2: the requirement of 'participation' is load-bearing for the proposal yet provides no analysis of how consent or involvement would scale to simulations of millions of agents or how it would constrain model accuracy or behavioral fidelity, leaving the feasibility of the guardrail unaddressed.
Authors: We agree that the feasibility and scalability of population participation require further elaboration. The original manuscript prioritizes articulating the ethical requirement over detailed implementation strategies. For the revision, we will add analysis to the preconditions section, including discussion of scalable approaches such as community representatives, differential privacy techniques for anonymized participation, and acknowledgment of potential impacts on simulation fidelity. We will also note that full participation may not always be feasible and suggest it as an ideal to strive toward. revision: yes
-
Referee: [Conclusion] The sufficiency argument for the overall framework: no mechanism is supplied showing why accountability or non-neutrality framing would limit the dual-use potential of accurate individual- or population-level behavioral models, making the prescriptive recommendation rest on an unsupported assumption.
Authors: This comment correctly identifies that the paper does not detail a specific mechanism by which these preconditions would constrain dual-use. As the work is conceptual and aims to set normative boundaries, it assumes that accountability structures and non-neutral framing can reduce risks through increased scrutiny and ethical awareness. We will revise the conclusion to explicitly frame this as a reasoned proposal rather than a proven sufficiency, and add a call for research into enforcement and effectiveness. We cannot provide an empirical mechanism at this stage, but we will clarify the argumentative basis. revision: partial
Circularity Check
No circularity: normative policy recommendations with no derivations or self-referential reductions
full rationale
The paper advances three ethical preconditions for LLM agent simulations as forward-looking policy proposals grounded in identified challenges (dual-use potential and validation difficulties). It contains no equations, no fitted parameters, no predictions derived from data subsets, and no self-citations that serve as load-bearing premises. The central claim—that the preconditions plus development reports will build trust and promote responsible use—is presented as a belief rather than a derived result, with no reduction to prior inputs by construction. This is a standard non-circular position paper whose argument rests on normative reasoning rather than any technical or empirical chain that could collapse into self-definition or fitted inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agent simulations have dual-use potential and are difficult to validate, necessitating ethical boundaries for developers and decision-makers.
Reference graph
Works this paper leans on
-
[1]
2021. Beyond Intent: Establishing Discriminatory Purpose in Algorithmic Risk Assessment.Harvard Law Review134, 5 (2021), pp. 1760–1781. https://www. jstor.org/stable/27028679
-
[2]
California Council on Science and Technology
California Council on Science and Technology 2025.Legislative Staff Academy on Artificial Intelligence. California Council on Science and Technology. https: //ccst.us/artificial-intelligence/academy/ Accessed 2026-04-08
2025
-
[3]
Center for Technological Responsibility, Reimagination and Redesign, Brown University
Center for Technological Responsibility, Reimagination and Redesign, Brown University 2026.CNTR and Watson Tech & Policy Summer School. Center for Technological Responsibility, Reimagination and Redesign, Brown University. https://cntr.brown.edu/summer-school Accessed 2026-04-08
2026
-
[4]
Stanford Institute for Human-Centered Artificial Intelli- gence
Stanford Institute for Human-Centered Artificial Intelligence 2026.Congres- sional Boot Camp on AI. Stanford Institute for Human-Centered Artificial Intelli- gence. https://hai.stanford.edu/policy/policymaker-education/congressional- boot-camp Accessed 2026-04-08
2026
-
[5]
Mike Ananny and Kate Crawford. 2018. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability.new media & society20, 3 (2018), 973–989
2018
- [6]
-
[7]
2019.Race After Technology: Abolitionist Tools for the New Jim Code
Ruha Benjamin. 2019.Race After Technology: Abolitionist Tools for the New Jim Code. Polity
2019
-
[8]
Cristina Besio, Cornelia Fedtke, Michael Grothe-Hammer, Athanasios Karafillidis, and Andrea Pronzini. 2025. Algorithmic responsibility without accountability: Understanding data-intensive algorithms and decisions in organisations.Systems Research and Behavioral Science42, 3 (2025), 739–755
2025
-
[9]
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, Noémi Éltető, et al
-
[10]
A foundation model to predict and capture human cognition.Nature644, 8078 (2025), 1002–1009
2025
-
[11]
Abeba Birhane, William Isaac, Vinodkumar Prabhakaran, Mark Diaz, Madeleine Clare Elish, Iason Gabriel, and Shakir Mohamed. 2022. Power to the People? Opportunities and Challenges for Participatory AI. InProceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization(Arlington, VA, USA)(EAAMO ’22). Association for Com...
-
[12]
Rishi Bommasani, Sanjeev Arora, Jennifer Chayes, Yejin Choi, Mariano-Florentino Cuéllar, Li Fei-Fei, Daniel E Ho, Dan Jurafsky, Sanmi Koyejo, Hima Lakkaraju, et al. 2025. Advancing science-and evidence-based AI policy.Science389, 6759 (2025), 459–461
2025
- [13]
-
[14]
2025.Nuclear Energy Support Near Record High in U.S
Megan Brenan. 2025.Nuclear Energy Support Near Record High in U.S. Gallup. https://news.gallup.com/poll/659180/nuclear-energy-support-near- record-high.aspx Accessed 2026-04-08
2025
-
[15]
Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum An- derson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, SJ Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna B...
-
[16]
Aditya Challapally, Chris Pease, Ramesh Raskar, and Pradyumna Chari. 2025. The GenAI divide: State of AI in business 2025.MIT Nanda(2025)
2025
-
[17]
Man-pui Sally Chan. 2025. Enhancing Trust in Science: Current Challenges and Recommendations for Policymakers, the Scientific Community, Media, and Public.Social and Personality Psychology Compass19, 11 (2025), e70104
2025
-
[18]
Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. InPro- ceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistic...
-
[19]
Ayush Chopra, Shashank Kumar, Nurullah Giray Kuru, Ramesh Raskar, and Arnau Quera-Bofarull. 2025. On the Limits of Agency in Agent-based Models. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems(Detroit, MI, USA)(AAMAS ’25). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 500–509
2025
-
[20]
Eric Corbett, Remi Denton, and Sheena Erete. 2023. Power and Public Partic- ipation in AI. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization(Boston, MA, USA)(EAAMO ’23). Association for Computing Machinery, New York, NY, USA, Article 8, 13 pages. doi:10.1145/3617694.3623228
-
[21]
Add Diverse Stakeholders and Stir
Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. 2021. Stakeholder Participation in AI: Beyond "Add Diverse Stakeholders and Stir". arXiv:2111.01122 [cs.AI] https://arxiv.org/abs/2111.01122
-
[22]
2018.Automating inequality: How high-tech tools profile, police, and punish the poor
Virginia Eubanks. 2018.Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press
2018
-
[23]
Richard F Fenno Jr. 1977. US House members in their constituencies: An explo- ration.American Political Science Review71, 3 (1977), 883–917
1977
-
[24]
Sam Fluit, Laura Cortés-García, and Tilmann von Soest. 2024. Social marginal- ization: A scoping review of 50 years of research.Humanities and Social Sciences Communications11, 1 (2024), 1665
2024
-
[25]
Alexander C Furnas, Timothy M LaPira, and Dashun Wang. 2025. Partisan disparities in the use of science in policy.Science388, 6745 (2025), 362–367
2025
-
[26]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM64, 12 (Nov. 2021), 86–92. doi:10.1145/3458723
-
[27]
Ben Green. 2021. Data Science as Political Action: Grounding Data Science in a Politics of Justice.Journal of Social Computing2, 3 (Sept. 2021), 249–265. doi:10.23919/jsc.2021.0029
-
[28]
Elisa D Harris, Robert Rosner, James M Acton, and Herbert Lin. 2016. Governance of dual-use technologies: Theory and practice. American Academy of Arts and Sciences
2016
-
[29]
Luke Hewitt, Ashwini Ashokkumar, Isaias Ghezae, and Robb Willer. 2024. Predict- ing results of social science experiments using large language models.Preprint (2024)
2024
-
[30]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielin- ski. 2020. The dataset nutrition label.Data protection and privacy12, 12 (2020), 1
2020
-
[31]
Jakko Kemper and Daan Kolkman. 2019. Transparent to whom? No algorithmic accountability without a critical audience.Information, Communication & Society 22, 14 (2019), 2081–2096
2019
-
[32]
Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The Selective Labels Problem: Evaluating Algorithmic Predic- tions in the Presence of Unobservables. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Halifax, NS, Canada)(KDD ’17). Association for Computing Ma...
-
[33]
Maik Larooij and Petter Törnberg. 2025. Validation is the central challenge for generative social simulation: a critical review of LLMs in agent-based modeling. Artificial Intelligence Review59, 1 (2025), 15
2025
- [34]
-
[35]
Marlene Lutz, Indira Sen, Georg Ahnert, Elisa Rogers, and Markus Strohmaier
-
[36]
The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemo- graphic Persona Prompting for Large Language Models. arXiv:2507.16076 [cs.CL] PoliSim@CHI 2026, April 16, 2026, Barcelona, Spain Steven Luo, Saanvi Arora, and Carlos Guirado https://arxiv.org/abs/2507.16076
-
[37]
Sandra G. Mayson. 2019. Bias In, Bias Out.Yale Law Journal128 (20 June 2019). https://yalelawjournal.org/article/bias-in-bias-out
2019
-
[38]
Matto Mildenberger and Alexander Sahn. 2025. The effect of policy traceability on legislative incentives.Legislative Studies Quarterly50, 4 (2025), e70036
2025
-
[39]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. InProceedings of the Conference on Fairness, Accountability, and Transparency(Atlanta, GA, USA)(FAT* ’19). Association for Computing Machinery, New York, NY, USA, 220–2...
-
[40]
R C Mitchell. 1980. Public opinion and nuclear power before and after Three Mile Island.Resources; (United States)64 (01 1980). https://www.osti.gov/biblio/ 5354975
1980
-
[41]
Deirdre K. Mulligan and Helen Nissenbaum. 2020. The Concept of Handoff as a Model for Ethical Analysis and Design. InThe Oxford Handbook of Ethics of AI. Oxford University Press. doi:10.1093/oxfordhb/9780190067397.013.15
-
[42]
Arvind Narayanan and Sayash Kapoor. 2025. AI as Normal Technology.Knight First Amendment Institute25-09 (15 April 2025). https://knightcolumbia.org/ content/ai-as-normal-technology
2025
-
[43]
Lynnette Hui Xian Ng and Kathleen M Carley. 2025. Are LLM-Powered Social Media Bots Realistic?. InInternational Conference on Social Computing, Behavioral- Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 14–23
2025
-
[44]
2012.The Growing Concern over Dual-Use
Ali Nouri. 2012.The Growing Concern over Dual-Use. American Association for the Advancement of Science. https://www.aaas.org/taxonomy/term/9/growing- concern-over-dual-use AAAS blog post
2012
-
[45]
Supreme Court of the United States. 1971. Griggs v. Duke Power Co., 401 U.S. 424 (1971). Decided March 8, 1971
1971
-
[46]
2021.Minding the gap: The disconnect between government bureaucracies and cultures of innovation in scaling
Brad Olsen. 2021.Minding the gap: The disconnect between government bureaucracies and cultures of innovation in scaling. Brookings Institu- tion. https://www.brookings.edu/articles/minding-the-gap-the-disconnect- between-government-bureaucracies-and-cultures-of-innovation-in-scaling/
2021
-
[47]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, ...
-
[48]
Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. 2024. Generative agent simulations of 1,000 people.arXiv preprint arXiv:2411.10109 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst
Inioluwa Deborah Raji, I. Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst
-
[50]
The Fallacy of AI Functionality. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic of Korea)(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 959–972. doi:10. 1145/3531146.3533158
-
[51]
Mona Sloane, Emanuel Moss, Olaitan Awomolo, and Laura Forlano. 2022. Partic- ipation Is not a Design Fix for Machine Learning. InProceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (Arlington, VA, USA)(EAAMO ’22). Association for Computing Machinery, New York, NY, USA, Article 1, 6 pages. doi:10.1145/3551...
-
[52]
2026.Poll: Majority of voters say risks of AI outweigh the benefits
Allan Smith and Shira Ovide. 2026.Poll: Majority of voters say risks of AI outweigh the benefits. NBC News. https://www.nbcnews.com/politics/politics-news/poll- majority-voters-say-risks-ai-outweigh-benefits-rcna262196 Accessed 2026-04- 08
2026
-
[53]
Bobra, Jennifer Tridgell, K
Genevieve Smith, Steven Luo, Hiral Jignesh Patel, Monica G. Bobra, Jennifer Tridgell, K. Jarrod Millman, Shachee Doshi, Katie Steen-James, Chinasa T. Okolo, Derek Slater, Nikko Stevens, Cathryn Carson, Ricardo Miron Torres, Natalia Luka, Judy Brewer, Woohyeuk Lee, Meredith M. Lee, Maximilian Gahntz, Isadora Cruxen, Cailean Osborne, Nicholas Garcia, David ...
2026
-
[54]
Paul Smits and Giulia Listorti. 2023. Using Models for Policymaking: The Ques- tions You Should Ask When Presented with the Use of Simulation Models in Policymaking.Publications Office of the European Union(2023). doi:10.2760/545843
-
[55]
2026.Aaru, the Billion-Dollar AI Startup Founded by Teenagers
Suzanne Vranica. 2026.Aaru, the Billion-Dollar AI Startup Founded by Teenagers. The Wall Street Journal. https://www.wsj.com/business/ai-startup-aaru-young- founders-35da7f87 Accessed 2026-04-08
2026
-
[56]
Angelina Wang, Jamie Morgenstern, and John P Dickerson. 2025. Large language models that replace human participants can harmfully misportray and flatten identity groups.Nature Machine Intelligence7, 3 (2025), 400–411
2025
- [57]
- [58]
-
[59]
2021.What Sen
Alana Wise. 2021.What Sen. Blumenthal’s ’finsta’ flub says about Congress’ grasp of Big Tech. NPR. https://www.npr.org/2021/10/04/1043150167/sen-blumenthals- finsta-flub-renews-questions-about-congress-grasp-of-big-tech Accessed 2026- 04-08
2021
- [60]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.