arxiv: 2603.11974 · v2 · submitted 2026-03-12 · 💻 cs.AI

Recognition: no theorem link

Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI

Luca Deck , Simeon Allmendinger , Lucas M\"uller , Niklas K\"uhl

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:16 UTC · model grok-4.3

classification 💻 cs.AI

keywords NormCoRenormative judgmentsmulti-agent AIreplicationdistributive justiceveil of ignorancefairness principlesfoundation models

0 comments

The pith

AI agents reach different fairness judgments than humans, varying by model and language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NormCoRe, a framework that translates the design of human subject experiments into multi-agent AI environments to study how norms form through coordination. It applies the method by replicating a classic distributive justice experiment in which agents negotiate fairness principles without knowing their own position in the outcome. The replication shows that AI agents produce normative judgments that differ from human baselines and that these judgments shift depending on the foundation model used and the language chosen to describe the agent personas. This matters because AI agents are increasingly deployed in roles that involve collective decisions on fairness, so identifying where their norms diverge helps clarify when such systems can or cannot stand in for human processes.

Core claim

NormCoRe maps the structural layers of human subject studies onto AI agent studies, enabling systematic documentation and analysis of normative dynamics in multi-agent AI. When used to replicate a veil-of-ignorance experiment on distributive justice, it shows that AI agents' normative judgments differ from those of human participants and are sensitive to the choice of foundation model and the language used to instantiate agent personas.

What carries the argument

NormCoRe, the replication-by-translation framework that maps structural layers of human subject studies onto multi-agent AI designs to document study choices and examine normative coordination.

If this is right

AI agent studies of norms require explicit reporting of model choice and persona language to allow comparison with human data.
Differences between AI and human judgments imply that AI systems cannot be assumed to replicate human normative coordination in fairness domains.
The framework supports systematic documentation whenever AI agents automate or assist tasks previously performed by humans.
Sensitivity to language and model indicates that prompt design choices can alter observed normative outcomes in multi-agent settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the observed differences stem mainly from language framing, targeted adjustments to persona descriptions might bring AI judgments closer to specific human reference groups.
Applying the same translation method to other classic experiments on cooperation or punishment could test whether model sensitivity appears across different types of norms.
The framework could be extended to track how norms evolve over repeated interactions in AI groups rather than single-shot negotiations.

Load-bearing premise

Translating the structure of a human subject study to an AI agent study preserves the normative dynamics under study without introducing artifacts from the model or prompt design.

What would settle it

A controlled side-by-side run of the identical negotiation task with human participants and AI agents that yields matching distributions of chosen fairness principles and no detectable dependence on model or language.

Figures

Figures reproduced from arXiv: 2603.11974 by Luca Deck, Lucas M\"uller, Niklas K\"uhl, Simeon Allmendinger.

**Figure 1.** Figure 1: From human groups to multi-agent AI: NormCoRe conceptualizes replication as a translation problem, mapping human subject studies to AI agent studies to study how collective normative judgments—such as fairness—emerge and differ across populations. 1 Introduction “Once upon a time people were born into communities and had to find their individuality. Today people are born individuals and have to find their … view at source ↗

**Figure 2.** Figure 2: The four translation layers illustrating the necessary analogies between individual layered components of human [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Individual-level preference ranking transitions before and after group deliberation. The vertical (horizontal) axis shows [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Preference shifts in individual distributive justice rankings before and after group deliberation, stratified by AI agent [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

In the late 2010s, the fashion trend NormCore framed sameness as a signal of belonging, illustrating how norms emerge through collective coordination. Today, similar forms of normative coordination can be observed in systems based on Multi-agent Artificial Intelligence (MAAI), as AI-based agents deliberate, negotiate, and converge on shared decisions in fairness-sensitive domains. Yet, existing empirical approaches often treat norms as targets for alignment or replication, implicitly assuming equivalence between human subjects and AI agents and leaving collective normative dynamics insufficiently examined. To address this gap, we propose Normative Common Ground Replication (NormCoRe), a novel methodological framework to systematically translate the design of human subject experiments into MAAI environments. Building on behavioral science, replication research, and state-of-the-art MAAI architectures, NormCoRe maps the structural layers of human subject studies onto the design of AI agent studies, enabling systematic documentation of study design and analysis of norms in MAAI. We demonstrate the utility of NormCoRe by replicating a seminal experimental study on distributive justice, in which participants negotiate fairness principles under a "veil of ignorance". We show that normative judgments in AI agent studies can differ from human baselines and are sensitive to the choice of the foundation model and the language used to instantiate agent personas. Our work provides a principled pathway for analyzing norms in MAAI and helps to guide, reflect, and document design choices whenever AI agents are used to automate or support tasks formerly carried out by humans.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NormCoRe supplies a clear translation protocol for turning human norm experiments into multi-agent AI setups, but the single demo's reported differences from human baselines remain open to prompt and model artifacts.

read the letter

The one thing to know is that NormCoRe gives a structured way to map the layers of a human subject study onto multi-agent AI agents so that norms can be examined in a documented fashion. They apply it to the classic veil-of-ignorance distributive justice task and find that the AI agents' fairness judgments diverge from human data and shift with the foundation model and the language used for personas. That sensitivity is the main empirical takeaway. The framework itself is the real contribution here. It spells out how to carry over instructions, roles, interaction rules, and outcome measures from behavioral work, which should make replications easier to compare and critique. Anyone running agent-based simulations of collective decisions will find the documentation steps useful even if they adapt the details. The soft spot sits in the empirical illustration. The paper treats the observed divergences as evidence that normative judgments in AI differ from humans, yet the same results show strong dependence on model choice and persona phrasing. Without ablations that keep the experimental logic fixed while changing only surface prompt realizations, it is hard to separate genuine replication of the normative process from LLM instruction-following biases or training priors. The stress-test note is right on this point. The single study is presented as a demonstration rather than a conclusive test, so the central claim rests on implementation details that need to be checked in the full manuscript. This work is aimed at researchers building or evaluating multi-agent systems for fairness-sensitive tasks. A reader who needs a practical method for comparing AI and human norms will get usable ideas from the framework, even if they plan to add their own controls. It deserves peer review because the methodological gap it targets is real and the topic is timely, though the empirical section will likely require tighter controls and clearer reporting to hold up.

Referee Report

2 major / 2 minor

Summary. The paper introduces Normative Common Ground Replication (NormCoRe), a methodological framework for translating the structural layers of human subject experiments into multi-agent AI (MAAI) environments to study normative coordination and emergence. It demonstrates the framework by replicating a seminal veil-of-ignorance distributive justice experiment, reporting that AI agents' normative judgments on fairness principles differ from human baselines and are sensitive to the choice of foundation model and the language used to instantiate agent personas.

Significance. If the structural mapping in NormCoRe can be shown to preserve core normative dynamics without dominant artifacts, the work would provide a useful structured approach for documenting and analyzing norms in MAAI systems, particularly in fairness-sensitive collective decision tasks. The demonstration's finding of divergences and sensitivities offers initial evidence that could inform alignment research and caution against assuming equivalence between human and AI normative reasoning.

major comments (2)

[Demonstration section] Demonstration of the veil-of-ignorance replication: the central claim that NormCoRe enables study of preserved normative dynamics (rather than prompt/model artifacts) is load-bearing, yet the reported sensitivity to foundation model and persona language raises the possibility that observed divergences from human baselines reflect LLM instruction-following biases or training priors. Without ablations that hold the experimental logic fixed while varying only surface prompt realizations, attribution of differences to the intended replication cannot be confirmed.
[Framework description] NormCoRe framework mapping (structural layers description): the translation from human subject design to MAAI is presented as preserving dynamics, but lacks explicit controls or tests for prompt artifacts, which directly affects whether the framework isolates collective norm emergence as claimed.

minor comments (2)

Clarify the exact statistical comparison methods used against human baselines, including any controls for multiple comparisons or effect size reporting.
The abstract and introduction could more explicitly distinguish the framework's contribution from prior work on AI agent alignment and replication studies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which help clarify the scope and limitations of the NormCoRe framework. We agree that distinguishing preserved normative dynamics from potential prompt or model artifacts is essential to the paper's central claims. Below we respond point by point and outline the revisions we will make.

read point-by-point responses

Referee: [Demonstration section] Demonstration of the veil-of-ignorance replication: the central claim that NormCoRe enables study of preserved normative dynamics (rather than prompt/model artifacts) is load-bearing, yet the reported sensitivity to foundation model and persona language raises the possibility that observed divergences from human baselines reflect LLM instruction-following biases or training priors. Without ablations that hold the experimental logic fixed while varying only surface prompt realizations, attribution of differences to the intended replication cannot be confirmed.

Authors: We acknowledge that the reported sensitivity to foundation model and persona language could in principle arise from instruction-following biases or training priors rather than the replicated experimental structure. At the same time, we interpret this sensitivity as a substantive result: it shows that normative judgments in MAAI are not fixed across implementations, which itself informs alignment considerations. To strengthen attribution, we will add targeted ablations in the revised demonstration section. These will hold the veil-of-ignorance logic, decision rules, and payoff structure fixed while varying only surface-level prompt realizations (e.g., synonym substitutions and minor rephrasings of instructions). Results from these ablations will be reported alongside the existing model and persona comparisons. revision: yes
Referee: [Framework description] NormCoRe framework mapping (structural layers description): the translation from human subject design to MAAI is presented as preserving dynamics, but lacks explicit controls or tests for prompt artifacts, which directly affects whether the framework isolates collective norm emergence as claimed.

Authors: We agree that the framework description would benefit from explicit controls and tests for prompt artifacts. In the revision we will expand the structural-layers section to include a dedicated subsection on artifact controls. This will specify procedures such as (i) prompt-equivalence checks that rephrase instructions while preserving logical structure and (ii) quantitative metrics comparing outcome distributions across these variants. These additions will make the claim that NormCoRe isolates collective norm emergence more testable and transparent. revision: yes

Circularity Check

0 steps flagged

NormCoRe is a methodological translation framework with empirical illustration; no derivation reduces to fitted inputs or self-citation by construction

full rationale

The paper introduces NormCoRe as a structural mapping from human-subject designs to MAAI agent studies and demonstrates it via a veil-of-ignorance replication. No equations, fitted parameters, or predictions appear in the provided text. The central claim—that normative judgments differ by model and persona language—is presented as an empirical observation rather than a self-definitional result or a load-bearing self-citation. The framework is self-contained against external benchmarks (behavioral science and replication research) without invoking author-specific uniqueness theorems or smuggling ansatzes. This yields a normal non-finding of circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that AI agents instantiated via language models can serve as valid proxies for studying normative coordination, plus the ad-hoc construct of the translation layers themselves.

axioms (1)

domain assumption Structural layers of human subject experiments can be mapped onto AI agent designs while preserving the normative phenomena of interest.
Invoked when the paper states that NormCoRe maps human study design onto MAAI environments.

invented entities (1)

NormCoRe framework no independent evidence
purpose: To provide a systematic translation protocol for replicating human norm studies in multi-agent AI.
New methodological construct introduced to address the identified gap.

pith-pipeline@v0.9.0 · 5584 in / 1285 out tokens · 33165 ms · 2026-05-15T12:16:48.352763+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2025. Playing repeated games with large language models.Nature Human Behaviour(2025), 1–11

work page 2025
[2]

Simeon Allmendinger, Lukas Bonenberger, Kathrin Endres, Dominik Fetzer, Henner Gimpel, and Niklas Kühl. 2025. Multi-Agent AI. Electronic Markets(2025)

work page 2025
[3]

Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan

work page
[4]

doi:10.1038/s41586-018-0637-6

The Moral Machine experiment.Nature563, 7729 (2018), 59–64. doi:10.1038/s41586-018-0637-6

work page doi:10.1038/s41586-018-0637-6 2018
[5]

Christopher A. Bail. 2024. Can Generative AI Improve Social Science?Proceedings of the National Academy of Sciences121, 21 (2024), e2314021121. doi:10.1073/pnas.2314021121

work page doi:10.1073/pnas.2314021121 2024
[6]

Monya Baker. 2016. 1, 500 scientists lift the lid on reproducibility.Nature533, 7604 (May 2016), 452–454. doi:10.1038/533452a 16•Luca Deck, Simeon Allmendinger, Lucas Müller, and Niklas Kühl

work page doi:10.1038/533452a 2016
[7]

Razan Baltaji, Babak Hemmatian, and Lav Varshney. 2024. Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi- Agent LLM Collaboration. InProceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP. Association for Computational Linguistics, Bangkok, Thailand, 17–31. doi:10.18653/v1/2024.c3nlp-1.2

work page doi:10.18653/v1/2024.c3nlp-1.2 2024
[8]

2005.The grammar of society: The nature and dynamics of social norms

Cristina Bicchieri. 2005.The grammar of society: The nature and dynamics of social norms. Cambridge University Press

work page 2005
[9]

Reuben Binns. 2018. Fairness in Machine Learning: Lessons from Political Philosophy.Conference on Fairness, Accountability and Transparency81 (2018), 149–159. https://proceedings.mlr.press/v81/binns18a.html

work page 2018
[10]

Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, Noémi Éltető, et al. 2025. A foundation model to predict and capture human cognition.Nature(2025), 1–8

work page 2025
[11]

Rishi Bommasani. 2021. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Douglas G. Bonett. 2021. Design and Analysis of Replication Studies.Organizational Research Methods24, 3 (2021), 513–529. arXiv:https://doi.org/10.1177/1094428120911088 doi:10.1177/1094428120911088

work page doi:10.1177/1094428120911088 2021
[13]

David Broska, Michael Howes, and Austin van Loon. 2025. The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations.Sociological Methods & Research(2025). doi:10.1177/00491241251326865 MIT Sloan Research Paper No. 7154-24

work page doi:10.1177/00491241251326865 2025
[14]

Melanie Brucks and Olivier Toubia. 2025. Prompt architecture induces methodological artifacts in large language models.PLOS ONE20, 4 (2025), e0319159. doi:10.1371/journal.pone.0319159

work page doi:10.1371/journal.pone.0319159 2025
[15]

Open Science Collaboration. 2015. Estimating the reproducibility of psychological science.Science349, 6251 (Aug. 2015). doi:10.1126/ science.aac4716

work page 2015
[16]

Ziyan Cui, Ning Li, and Huaikang Zhou. 2025. A large-scale replication of scenario-based experiments in psychology and management using large language models.Nature Computational Science5, 8 (July 2025), 627–634. doi:10.1038/s43588-025-00840-7

work page doi:10.1038/s43588-025-00840-7 2025
[17]

Luca Deck, Simeon Allmendinger, Lucas Müller, and Niklas Kühl. 2026. NormCoRe: AI Agents and Distributive Justice [Software]. https://github.com/Lucas-Mueller/Normative_Common_Ground_Replication_NormCoRe

work page 2026
[18]

Ruben Durante, Louis Putterman, and Joël van der Weele. 2014. Preferences for Redistribution and Perception of Fairness: An Experimental Study.Journal of the European Economic Association12, 4 (2014), 1059–1086. doi:10.1111/jeea.12082

work page doi:10.1111/jeea.12082 2014
[19]

Esin Durmus, Karina Nguyen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, and Deep Ganguli. 2023. Towards Measuring the Representation of Subjective Global Opinions in Langua...

work page arXiv 2023
[20]

Oppenheimer

Norman Frohlich and Joe A. Oppenheimer. 1992.Choosing Justice: An Experimental Approach to Ethical Theory. Vol. 22. University of California Press

work page 1992
[21]

Michele J Gelfand, Sergey Gavrilets, and Nathan Nunn. 2024. Norm dynamics: Interdisciplinary perspectives on social norm emergence, persistence, and change.Annual Review of Psychology75, 1 (2024), 341–378

work page 2024
[22]

Matthew Grizzard, Rebecca Frazer, Andrew Luttrell, Charles K Monge, Nicholas L Matthews, C Joseph Francemone, and Michelle E Frazer. 2025. ChatGPT does not replicate human moral judgments: the importance of examining metrics beyond correlation to assess agreement.Scientific Reports15, 1 (2025), 40965

work page 2025
[23]

Thilo Hagendorff, Ishita Dasgupta, Marcel Binz, Stephanie C. Y. Chan, Andrew Lampinen, Jane X. Wang, Zeynep Akata, and Eric Schulz

work page
[24]

doi:10.48550/arXiv.2303.13988

Machine Psychology. doi:10.48550/arXiv.2303.13988

work page doi:10.48550/arxiv.2303.13988
[25]

Thilo Hagendorff, Sarah Fabi, and Michal Kosinski. 2023. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.Nature Computational Science3, 10 (2023), 833–838

work page 2023
[26]

2001.Social Norms

Michael Hechter and Karl-Dieter Opp. 2001.Social Norms. Russell Sage Foundation

work page 2001
[27]

Daniel Kahneman and Amos Tversky. 1979. Prospect Theory: An Analysis of Decision under Risk.Econometrica47, 2 (1979), 263–291

work page 1979
[28]

Tine Köhler and Jose M. Cortina. 2021. Play It Again, Sam! An Analysis of Constructive Replication in the Organizational Sciences. Journal of Management47, 2 (2021), 488–518. arXiv:https://doi.org/10.1177/0149206319843985 doi:10.1177/0149206319843985

work page doi:10.1177/0149206319843985 2021
[29]

Travis LaCroix. 2022. Moral Dilemmas for Moral Machines. 2, 4 (2022), 737–746. doi:10.1007/s43681-022-00134-y

work page doi:10.1007/s43681-022-00134-y 2022
[30]

Lee, Jacob M

Messi H.J. Lee, Jacob M. Montgomery, and Calvin K. Lai. 2024. Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY,...

work page arXiv 2024
[31]

Yan Leng. 2024. Can LLMs Mimic Human-Like Mental Accounting and Behavioral Biases?SSRN Electronic Journal(2024). doi:10.2139/ ssrn.4705130

work page 2024
[32]

1969.Convention: A philosophical study

David Lewis. 1969.Convention: A philosophical study. Harvard University Press

work page 1969
[33]

Xinyi Li, Shuo Wang, and Shuang Zeng. 2024. A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth1, 9 (2024), 1–35. doi:10.1007/s44336-024-00009-2

work page doi:10.1007/s44336-024-00009-2 2024
[34]

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2024. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 17889–17904. doi:10.18653/v1/2024.emnlp-main.992 Normative Common ...

work page doi:10.18653/v1/2024.emnlp-main.992 2024
[35]

Jack Lindsey, Wes Gurnee, Emmanuel Ameisen, Brian Chen, Adam Pearce, Nicholas L. Turner, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivo...

work page 2025
[36]

Rada Mihalcea, Oana Ignat, Longju Bai, Angana Borah, Luis Chiruzzo, Zhijing Jin, Claude Kwizera, Joan Nwatu, Soujanya Poria, and Thamar Solorio. 2025. Why AI Is WEIRD and Shouldn’t Be This Way: Towards AI for Everyone, with Everyone, by Everyone.Proceedings of the AAAI Conference on Artificial Intelligence39, 27 (2025), 28657–28670. doi:10.1609/aaai.v39i27.35092

work page doi:10.1609/aaai.v39i27.35092 2025
[37]

Justin M Mittelstädt, Julia Maier, Panja Goerke, Frank Zinn, and Michael Hermes. 2024. Large language models can outperform humans in social situational judgments.Scientific reports14, 1 (2024), 27449

work page 2024
[38]

1903.Principia ethica

George Edward Moore. 1903.Principia ethica. Cambridge University Press

work page 1903
[39]

This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology,

Deirdre K. Mulligan, Joshua A. Kroll, Nitin Kohli, and Richmond Y. Wong. 2019. This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology.Proceedings of the ACM on Human-Computer Interaction3, CSCW (2019), 1–36. doi:10.1145/3359221

work page doi:10.1145/3359221 2019
[40]

Government Publishing Office / Office of the Federal Register

U.S. Government Publishing Office / Office of the Federal Register. 2025. 45 CFR § 46.102 — Definitions for Purposes of This Policy. Electronic Code of Federal Regulations (eCFR). https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-A/part-46/subpart- A/section-46.102 Title 45, Public Welfare, Part 46, Subpart A, Protection of Human Subjects, Defin...

work page 2025
[41]

Peng, and Jeffrey T

Prasad Patil, Roger D. Peng, and Jeffrey T. Leek. 2016. A statistical definition for reproducibility and replicability. doi:10.1101/066803

work page doi:10.1101/066803 2016
[42]

1971.A Theory of Justice

John Rawls. 1971.A Theory of Justice. Belknap Press of Harvard University Press, Cambridge, MA

work page 1971
[43]

Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2024. Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design. InThe Twelfth International Conference on Learning Representations

work page 2024
[44]

Ali Akbar Septiandri, Marios Constantinides, Mohammad Tahaei, and Daniele Quercia. 2023. WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic Is FAccT?. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). Association for Computing Machinery, 160–171. doi:10.1145/3593013.3593985

work page doi:10.1145/3593013.3593985 2023
[45]

Jan Simson, Florian Pfisterer, and Christoph Kern. 2024. One model many scores: Using multiverse analysis to prevent fairness hacking and evaluate the influence of model design decisions. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1305–1320

work page 2024
[46]

Hamid Taghavifar, Chuan Hu, Chongfeng Wei, Ardashir Mohammadzadeh, and Chunwei Zhang. 2025. Behaviorally-Aware Multi- Agent RL With Dynamic Optimization for Autonomous Driving.IEEE Transactions on Automation Science and Engineering22 (2025), 10672–10683. doi:10.1109/TASE.2025.3527327

work page doi:10.1109/tase.2025.3527327 2025
[47]

Kazuhiro Takemoto. 2024. The moral machine experiment on large language models.Royal Society Open Science11, 2 (Feb. 2024). doi:10.1098/rsos.231393

work page doi:10.1098/rsos.231393 2024
[48]

Richard H. Thaler. 1985. Mental Accounting and Consumer Choice.Marketing Science4, 3 (1985), 199–214

work page 1985
[49]

Richard H. Thaler. 1988. Anomalies: The Ultimatum Game.Journal of Economic Perspectives2, 4 (Dec. 1988), 195–206. doi:10.1257/jep.2. 4.195

work page doi:10.1257/jep.2 1988
[50]

Eric WK Tsang and Kai-Man Kwan. 1999. Replication and theory development in organizational science: A critical realist perspective. Academy of Management review24, 4 (1999), 759–780

work page 1999
[51]

Anton Voronov, Lena Wolf, and Max Ryabinin. 2024. Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements. InFindings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics, Bangkok, Thailand, 6287–6310. doi:10.18653/v1/2024.findings-acl.375

work page doi:10.18653/v1/2024.findings-acl.375 2024
[52]

McKee, Richard Everett, Saffron Huang, Tina O

Laura Weidinger, Kevin R. McKee, Richard Everett, Saffron Huang, Tina O. Zhu, Martin J. Chadwick, Christopher Summerfield, and Iason Gabriel. 2023. Using the Veil of Ignorance to Align AI Systems with Principles of Justice.Proceedings of the National Academy of Sciences120, 18 (2023), e2213709120. doi:10.1073/pnas.2213709120

work page doi:10.1073/pnas.2213709120 2023
[53]

Ruoxi Xu, Yingfei Sun, Mengjie Ren, Shiguang Guo, Ruotong Pan, Hongyu Lin, Le Sun, and Xianpei Han. 2024. AI for Social Science and Social Science of AI: A Survey.Inf. Process. Manage.61, 3 (2024). doi:10.1016/j.ipm.2024.103665

work page doi:10.1016/j.ipm.2024.103665 2024
[54]

Cummings, and Byron Reeves

Leo Yeykelis, Kaavya Pichai, James J. Cummings, and Byron Reeves. 2024. Using Large Language Models to Create AI Personas for Replication, Generalization and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings. doi:10.48550/ARXIV.2408.16073

work page doi:10.48550/arxiv.2408.16073 2024
[55]

Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, and Xuelong Li. 2025. Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, ...

work page doi:10.18653/v1/2025.findings-acl.84 2025
[56]

Zwaan, Alexander Etz, Richard E

Rolf A. Zwaan, Alexander Etz, Richard E. Lucas, and M. Brent Donnellan. 2018. Making replication mainstream.Behavioral and Brain Sciences41 (2018), e120. doi:10.1017/S0140525X17001972 18•Luca Deck, Simeon Allmendinger, Lucas Müller, and Niklas Kühl Appendix Normative Common Ground Replication (NormCoRe)•19 Table 2. NormCoRe translation table for cognitive...

work page doi:10.1017/s0140525x17001972 2018