Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research

Jennifer Haase; Sebastian Pokutta

arxiv: 2506.01839 · v3 · pith:QOHEJT34new · submitted 2025-06-02 · 💻 cs.MA

Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research

Jennifer Haase , Sebastian Pokutta This is my paper

Pith reviewed 2026-05-22 01:07 UTC · model grok-4.3

classification 💻 cs.MA

keywords LLM agentsmulti-agent systemssocial science researchemergent social dynamicsgroup dynamicsnorm formationreproducibilityethical challenges

0 comments

The pith

Multi-agent systems of large language models can simulate emergent social dynamics to advance social science research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a six-level framework for LLM-based agent systems to show how they progress from simple tools to complex simulators of social behavior. Higher levels allow investigation of group dynamics and norm formation in ways that scale beyond traditional research methods. A reader would care because this could transform how social scientists conduct experiments on large groups without logistical or ethical barriers of real human studies. The authors note that realizing this requires solving issues of reproducibility and bias through careful protocols and collaboration.

Core claim

By mapping LLM agents across six levels of increasing complexity, the paper shows that higher-tier multi-agent systems can simulate emergent social dynamics, enabling new forms of inquiry into group processes and large-scale social phenomena that static or single-agent approaches cannot achieve.

What carries the argument

A six-level developmental continuum of agent architectures that separates basic data-processing agents from advanced multi-agent systems able to exhibit and study emergent social behaviors.

If this is right

Lower-tier agents streamline routine tasks like text classification and data annotation in social research.
Higher-tier systems facilitate the study of group dynamics, norm formation, and large-scale social processes.
Challenges such as reproducibility, ethical oversight, and emergent biases must be addressed for reliable use.
Robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics are essential.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Virtual experiments could test theories of social influence at population scales previously unfeasible.
Connections to fields like computational sociology may emerge if the simulations prove accurate.
Risks of over-reliance on simulated data could lead to new methodological debates in the social sciences.

Load-bearing premise

That the technical distinctions between agent levels are clear and that multi-agent LLM simulations can produce emergent social dynamics that correspond meaningfully to real human interactions.

What would settle it

An experiment matching results from a multi-agent LLM simulation of norm emergence against parallel real-world human group studies; mismatches in patterns would indicate the simulations do not yield valid insights.

Figures

Figures reproduced from arXiv: 2506.01839 by Jennifer Haase, Sebastian Pokutta.

read the original abstract

As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper that organizes LLM agent uses in social science into six levels but offers no tests or data to back its claims about valid emergent insights.

read the letter

The paper's core move is a six-level developmental scale for multi-agent LLM systems, from basic single agents doing classification up to setups meant to simulate group norms and large-scale social processes. It pulls together scattered applications and flags practical hurdles like reproducibility and bias. That synthesis is the main contribution and it is useful as a quick map for people entering the area. The discussion of ethical oversight and the need for validation protocols is straightforward and hits real concerns without overclaiming. Credit there for keeping the tone measured on deployment issues. The soft spots are central rather than peripheral. The framework stays descriptive with no worked examples, no technical specs that would let someone implement or distinguish the levels cleanly, and no comparison to existing social-science benchmarks or human data. The claim that higher levels enable novel inquiry into dynamics like norm formation therefore rests on an untested assumption that LLM-generated emergence will be faithful enough to real behavior. The stress-test note is right on this point; nothing in the text supplies criteria or evidence that would make the outputs more than LLM-specific artifacts. Readers who already work with agent simulations might skim it for the taxonomy and move on. It could fit a reading group as background for debating how much simulation work should count as social science, but it is not the kind of piece that changes methods or findings. A serious editor could send it for review at a venue that takes position papers, mainly to get referees to press for clearer boundaries or pilot validations. Without that, it stays an organizational note rather than a grounded proposal.

Referee Report

2 major / 1 minor

Summary. The paper introduces a six-level developmental framework for LLM-based agent systems in social science research, ranging from basic data-processing agents to complex multi-agent architectures that purportedly simulate emergent phenomena such as group dynamics and norm formation. It contrasts lower-tier systems' utility for conventional tasks like annotation with higher-tier systems' potential for novel inquiries, while cataloging challenges around reproducibility, ethics, and bias, and calling for validation protocols and interdisciplinary standards.

Significance. If the six-level taxonomy proves operationalizable and the fidelity of higher-tier emergent behaviors to human social processes can be established, the framework could serve as a useful organizing device for researchers transitioning from static LLM tools to agentic simulations. However, the manuscript supplies no empirical demonstrations, benchmark comparisons, or formal criteria, so its significance remains prospective rather than demonstrated.

major comments (2)

[Abstract and six-level framework description] The central claim that higher-tier multi-agent systems enable valid study of group dynamics and large-scale social processes (abstract) rests on an untested assumption of behavioral fidelity; the manuscript provides neither explicit technical criteria (interaction protocols, memory mechanisms, alignment procedures) for distinguishing the six levels nor any comparison against established social-science benchmarks or human data.
[Framework overview] The paper asserts that the framework clarifies 'technical and methodological boundaries' between architectures, yet offers only descriptive classification without operational definitions, decision rules, or examples that would allow a reader to assign a given system to a level or evaluate its claimed capabilities.

minor comments (1)

[Abstract and Conclusion] The abstract and conclusion repeat the call for 'robust validation protocols' and 'standardized evaluation metrics' without specifying what those protocols or metrics would look like or citing relevant existing work in agent evaluation or social simulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments, which help us clarify the scope and utility of our proposed six-level framework. We address each major comment below and describe the revisions we will make to improve the manuscript.

read point-by-point responses

Referee: [Abstract and six-level framework description] The central claim that higher-tier multi-agent systems enable valid study of group dynamics and large-scale social processes (abstract) rests on an untested assumption of behavioral fidelity; the manuscript provides neither explicit technical criteria (interaction protocols, memory mechanisms, alignment procedures) for distinguishing the six levels nor any comparison against established social-science benchmarks or human data.

Authors: We appreciate the referee's emphasis on the distinction between potential and demonstrated validity. The manuscript is a conceptual framework paper whose primary contribution is to organize existing architectures along a developmental continuum; the abstract language regarding higher-tier systems is deliberately prospective ('enable novel forms of inquiry') rather than a claim of established behavioral fidelity. We agree that the framework would be strengthened by explicit technical criteria. In the revised manuscript we will add a new subsection that specifies distinguishing features for each level, including interaction protocols, memory mechanisms, and alignment procedures. We will also incorporate references to existing empirical studies that begin to benchmark agent behaviors against human data and will add an explicit statement that systematic validation against social-science benchmarks remains an open research priority. revision: yes
Referee: [Framework overview] The paper asserts that the framework clarifies 'technical and methodological boundaries' between architectures, yet offers only descriptive classification without operational definitions, decision rules, or examples that would allow a reader to assign a given system to a level or evaluate its claimed capabilities.

Authors: This observation is fair. The current version relies primarily on narrative description. We will revise the framework overview to include operational definitions and decision rules for level assignment, together with concrete examples of published systems at each level. These additions will be presented in a structured table and accompanying text so that readers can more readily classify new systems and assess their capabilities relative to the framework's claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity in descriptive classification framework

full rationale

The paper introduces a six-level descriptive taxonomy of LLM agent architectures ranging from simple processors to multi-agent systems for simulating social dynamics. This is a conceptual mapping with no equations, fitted parameters, derivations, or quantitative predictions that could reduce to the paper's own inputs by construction. Central claims about enabling novel social science inquiry are presented as forward-looking potential rather than derived results, and the framework stands as self-contained without load-bearing self-citations or ansatzes that collapse into tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that multi-agent LLM systems can meaningfully simulate emergent social dynamics; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption LLM-based agents can be meaningfully organized into a six-level developmental continuum with distinct technical and methodological boundaries.
This organizing principle is invoked throughout the abstract to distinguish capabilities and applications.

pith-pipeline@v0.9.0 · 5767 in / 1308 out tokens · 54582 ms · 2026-05-22T01:07:48.225571+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By mapping this developmental continuum across six levels... higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

functional thresholds—such as memory integration, autonomy, coordination, and learning... aligned with the OODA loop

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agentic MIP Research: Accelerated Constraint Handler Generation
cs.AI 2026-05 unverdicted novelty 7.0

LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

doi: 10.1017/pan.2023.2

ISSN 1047-1987, 1476-4989. doi: 10.1017/pan.2023.2. Berk Atil, Alexa Chittams, Liseng Fu, Ferhan Ture, Lixinyu Xu, and Breck Baldwin. LLM Stability: A detailed analysis with some surprises, September

work page doi:10.1017/pan.2023.2 1987
[2]

Jelle Boers, Terra Etty, Martine Baars, and Kim van Boekhoven

doi: 10.1073/pnas.2314021121. Jelle Boers, Terra Etty, Martine Baars, and Kim van Boekhoven. Exploring cognitive strategies in human-AI interaction: ChatGPT’s role in creative tasks. Journal of Creativity, 35(1):100095, April

work page doi:10.1073/pnas.2314021121
[3]

doi: 10.1016/j.yjoc.2025.100095

ISSN 2713-3745. doi: 10.1016/j.yjoc.2025.100095. Daniil A. Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models, April

work page doi:10.1016/j.yjoc.2025.100095 2025
[4]

Towards the Scalable Evaluation of Cooperativeness in Language Models, March 2023a

Alan Chan, Maxime Riché, and Jesse Clifton. Towards the Scalable Evaluation of Cooperativeness in Language Models, March 2023a. 19 Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, J...

work page doi:10.1145/3593013.3594033 2023
[5]

doi: 10.1038/s44159-023-00241-5

ISSN 2731-0574. doi: 10.1038/s44159-023-00241-5. Edgar A. Duéñez-Guzmán, Suzanne Sadedin, Jane X. Wang, Kevin R. McKee, and Joel Z. Leibo. A social path to human-like artificial intelligence. Nature Machine Intelligence , 5(11):1181–1188, November

work page doi:10.1038/s44159-023-00241-5
[6]

doi: 10.1038/s42256-023-00754-x

ISSN 2522-5839. doi: 10.1038/s42256-023-00754-x. Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, and Iulian Serban. How Teachers Can Use Large Language Models and Bloom’s Taxonomy to Create Educational Quizzes.Proceedings of the AAAI Conference on Artificial Intelligence, 38(21):23084–23091, March

work page doi:10.1038/s42256-023-00754-x
[7]

doi: 10.1609/aaai.v38i21.30353

ISSN 2374-3468. doi: 10.1609/aaai.v38i21.30353. Joshua M. Epstein. Agent-based computational models and generative social science. Complexity, 4(5):41–60, May

work page doi:10.1609/aaai.v38i21.30353
[8]

doi: 10.1002/(SICI)1099-0526(199905/06)4:5<41::AID-CPLX9>3.0.CO;2-F

ISSN 1076-2787, 1099-0526. doi: 10.1002/(SICI)1099-0526(199905/06)4:5<41::AID-CPLX9>3.0.CO;2-F. Joshua M. Epstein. Generative Social Science: Studies in Agent-Based Computational Modeling . Princeton University Press,

work page doi:10.1002/(sici)1099-0526(199905/06)4:5
[9]

ISBN 979-8-4007-0704-9

ACM. ISBN 979-8-4007-0704-9. doi: 10.1145/3670865.3673513. Mehmet Firat and Saniye Kuleli. What if GPT4 Became Autonomous: The Auto-GPT Project and Use Cases. Journal of Emerging Computer Technologies , 3(1):1–6,

work page doi:10.1145/3670865.3673513
[10]

doi: 10.57020/ject.1297961

ISSN 2757-8267. doi: 10.57020/ject.1297961. James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, and Colton Mikolajczyk. Limits of Large Language Models in Debating Humans, February

work page doi:10.57020/ject.1297961
[11]

Replication in Social Science

20 Jeremy Freese and David Peterson. Replication in Social Science. Annual Review of Sociology, 43(Volume 43, 2017): 147–165, July

work page 2017
[12]

S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents

ISSN 0360-0572, 1545-2115. doi: 10.1146/annurev-soc-060116-053450. Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. S3: Social-network Simulation System with Large Language Model-Empowered Agents. https://arxiv.org/abs/2307.14984v2, July

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1146/annurev-soc-060116-053450
[13]

doi: 10.1057/s41599-024-03611-3

ISSN 2662-9992. doi: 10.1057/s41599-024-03611-3. Juraj Gottweis and Vivek Natarajan. Accelerating scientific breakthroughs with an AI co-scientist. https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/, February

work page doi:10.1057/s41599-024-03611-3
[14]

doi: 10.1126/science.adi1778

ISSN 0036-8075, 1095-9203. doi: 10.1126/science.adi1778. Guidelines are urgently needed. The AI writing on the wall. Nature Machine Intelligence, 5(1):1–1, January

work page doi:10.1126/science.adi1778
[15]

doi: 10.1038/s42256-023-00613-9

ISSN 2522-5839. doi: 10.1038/s42256-023-00613-9. Jennifer Haase and Paul H. P. Hanel. Artificial muses: Generative artificial intelligence chatbots have risen to human-level creativity. Journal of Creativity, 33(3):1–7,

work page doi:10.1038/s42256-023-00613-9
[16]

F., & Oliver, L

ISSN 2713-3745. doi: 10.1016/j.yjoc.2023.100066. Jennifer Haase and Sebastian Pokutta. Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration, November

work page doi:10.1016/j.yjoc.2023.100066 2023
[17]

doi: 10.30844/aistes.v4i1.17

ISSN 1867-7134. doi: 10.30844/aistes.v4i1.17. Jennifer Haase, Paul H. P. Hanel, and Sebastian Pokutta. Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability, April 2025a. Jennifer Haase, Paul H. P. Hanel, and Sebastian Pokutta. S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Asse...

work page doi:10.30844/aistes.v4i1.17
[18]

doi: 10.1177/2057150X241306780

ISSN 2057-150X. doi: 10.1177/2057150X241306780. Muhua Huang, Xijuan Zhang, Christopher Soto, and James Evans. Designing LLM-Agents with Personalities: A Psychometric Approach, October 2024a. Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of LLM agents: A surve...

work page doi:10.1177/2057150x241306780 2057
[19]

doi: 10.1038/s41746-024-01422-z

ISSN 2398-6352. doi: 10.1038/s41746-024-01422-z. Jonathan Kantor. Best practices for implementing ChatGPT, large language models, and artificial intelligence in qualitative and survey-based research. JAAD International, 14:22–23, March

work page doi:10.1038/s41746-024-01422-z
[20]

doi: 10.1016/j.jdin.2023.10.001

ISSN 2666-3287, 2666-3287. doi: 10.1016/j.jdin.2023.10.001. Andres Karjus. Machine-assisted quantitizing designs: Augmenting humanities and social sciences with artificial intelligence. Humanities and Social Sciences Communications , 12(1):1–18, February

work page doi:10.1016/j.jdin.2023.10.001 2023
[21]

doi: 10.1057/s41599-025-04503-w

ISSN 2662-9992. doi: 10.1057/s41599-025-04503-w. Luoma Ke, Song Tong, Peng Cheng, and Kaiping Peng. Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review,

work page doi:10.1057/s41599-025-04503-w
[22]

doi: 10.1016/j.psychres.2023.115667

ISSN 0165-1781. doi: 10.1016/j.psychres.2023.115667. Brice Valentin Kok-Shun, Johnny Chan, and Gabrielle Peko. Intertwining Two Artificial Minds: Chaining GPT and RoBERTa for Emotion Detection. In 2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE),

work page doi:10.1016/j.psychres.2023.115667 2023
[23]

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. Advances in Neural Information Processing Systems, 36:51991–52008, December 2023a. Jiazheng Li, Artem Bobrov, David West, Cesare Aloisi, and Yulan He. An Automated Explainable Educational Assessm...

work page doi:10.1609/aaai.v39i28.35358
[24]

Liang Liu, Dong Zhang, Shoushan Li, Guodong Zhou, and Erik Cambria

doi: 10.1098/rsos.240682. Liang Liu, Dong Zhang, Shoushan Li, Guodong Zhou, and Erik Cambria. Two Heads are Better than One: Zero-shot Cognitive Reasoning via Multi-LLM Knowledge Fusion. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , CIKM ’24, pages 1462–1472, New York, NY, USA, October

work page doi:10.1098/rsos.240682
[25]

ISBN 979-8-4007-0436-9

Association for Computing Machinery. ISBN 979-8-4007-0436-9. doi: 10.1145/3627673.3679744. Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems , volume

work page doi:10.1145/3627673.3679744
[26]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, sep 2024a. URL http://arxiv.org/abs/2408.06292. Yikang Lu, Alberto Aleta, Chunpeng Du, Lei Shi, and Yamir Moreno. LLMs and generative agent-based models for complex systems research. Physics of Life R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.plrev.2024.10.013 2024
[27]

doi: 10.1080/ 14626268.2022.2082486

ISSN 1462-6268. doi: 10.1080/ 14626268.2022.2082486. Konstantinos Mitsopoulos, Ritwik Bose, Brodie Mather, Archna Bhatia, Kevin Gluck, Bonnie Dorr, Christian Lebiere, and Peter Pirolli. Psychologically-Valid Generative Agents: A Novel Approach to Agent-Based Modeling in Social Sciences. Proceedings of the AAAI Symposium Series , 2(1):340–348,

work page arXiv 2022
[28]

doi: 10.1609/aaaiss.v2i1.27698

ISSN 2994-4317. doi: 10.1609/aaaiss.v2i1.27698. Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, and Ilya Makarov. The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games, June

work page doi:10.1609/aaaiss.v2i1.27698
[29]

Kelvin M

doi: 10.1109/KMN.2002.1115175. Kelvin M. Mwita. Strengths and weaknesses of qualitative research in social science studies. International Journal of Research in Business and Social Science , 11(6):618–625,

work page doi:10.1109/kmn.2002.1115175 2002
[30]

doi: 10.1037/pspp0000544

ISSN 1939-1315. doi: 10.1037/pspp0000544. Qian Niu, Junyu Liu, Ziqian Bi, Pohsun Feng, Benji Peng, Keyu Chen, Ming Li, Lawrence KQ Yan, Yichao Zhang, Caitlyn Heqi Yin, Cheng Fei, Tianyang Wang, Yunze Wang, Silin Chen, and Ming Liu. Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges, December

work page doi:10.1037/pspp0000544 1939
[31]

doi: 10.1016/j.tsc.2023.101356

ISSN 1871-1871. doi: 10.1016/j.tsc.2023.101356. Frans PB Osinga. Science, Strategy and War: The Strategic Theory of John Boyd . Routledge,

work page doi:10.1016/j.tsc.2023.101356 2023
[32]

O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Association for Computing Machinery. ISBN 979-8-4007-0132-0. doi: 10.1145/3586183.3606763. 23 Taejin Park. Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework, March

work page doi:10.1145/3586183.3606763
[33]

doi: 10.1057/s41599-024-03609-x

ISSN 2662-9992. doi: 10.1057/s41599-024-03609-x. Mikko Rask and Koki Shimizu. Beyond the Average: Exploring the Potential and Challenges of Large Language Models in Social Science Research. In 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), pages 1–5, February

work page doi:10.1057/s41599-024-03609-x 2024
[34]

John Roberts, Max Baker, and Jane Andrew

doi: 10.1109/ACDSA59508.2024.10467341. John Roberts, Max Baker, and Jane Andrew. Artificial intelligence and qualitative research: The promise and perils of large language model (LLM) ‘assistance’.Critical Perspectives on Accounting , 99:102722, March

work page doi:10.1109/acdsa59508.2024.10467341 2024
[35]

doi: 10.1016/j.cpa.2024.102722

ISSN 1045-2354. doi: 10.1016/j.cpa.2024.102722. Luca Rossi, Katherine Harrison, and Irina Shklovski. The Problems of LLM-generated Data in Social Science Research. Sociologica, 18(2):145–168,

work page doi:10.1016/j.cpa.2024.102722 2024
[36]

doi: 10.1002/mar.21982

ISSN 1520-6793. doi: 10.1002/mar.21982. Eric Hal Schwartz. Google’s Gemini AI Is now a Pokémon Master, May

work page doi:10.1002/mar.21982
[37]

doi: 10.1038/s42256-024-00800-2

ISSN 2522-5839. doi: 10.1038/s42256-024-00800-2. Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Chan Jun Shern, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, and Tejal Patwardhan. PaperBench: Evaluating AI’s Ability to Replicate AI Research,

work page doi:10.1038/s42256-024-00800-2
[38]

doi: 10.1007/s13278-025-01428-9

ISSN 1869-5469. doi: 10.1007/s13278-025-01428-9. Lindia Tjuatja, Valerie Chen, Tongshuang Wu, Ameet Talwalkwar, and Graham Neubig. Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design. Transactions of the Association for Computational Linguistics, 12:1011–1026, September

work page doi:10.1007/s13278-025-01428-9
[39]

doi: 10.1162/tacl_a_00685

ISSN 2307-387X. doi: 10.1162/tacl_a_00685. Daniel Valdenegro. A LLM digest for social scientist,

work page doi:10.1162/tacl_a_00685
[40]

doi: 10.1093/scipol/scae070

ISSN 0302-3427. doi: 10.1093/scipol/scae070. Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Assessing Bias in LLM-Generated Synthetic Datasets: The Case of German Voter Behavior,

work page doi:10.1093/scipol/scae070
[41]

Dickerson

Angelina Wang, Jamie Morgenstern, and John P. Dickerson. Large language models that replace human partici- pants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3):400–411, March 2025a. ISSN 2522-5839. doi: 10.1038/s42256-025-00986-z. Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, J...

work page doi:10.1038/s42256-025-00986-z
[42]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, March 2024

ISSN 2095-2236. doi: 10.1007/s11704-024- 40231-1. Lei Wang, Zheqing Zhang, and Xu Chen. Investigating and Extending Homans’ Social Exchange Theory with Large Language Model based Agents, February 2025b. Ruiqi Wang, Jiyu Guo, Cuiyun Gao, Guodong Fan, Chun Yong Chong, and Xin Xia. Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in So...

work page doi:10.1007/s11704-024- 2095
[43]

ISBN 978-981-97-7232-2

Springer Nature. ISBN 978-981-97-7232-2. doi: 10.1007/978-981-97-7232-2_14. Tianlong Xu, YiFan Zhang, Zhendong Chu, Shen Wang, and Qingsong Wen. AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction. Proceedings of the AAAI Conference on Artificial Intelligence , 39(28)...

work page doi:10.1007/978-981-97-7232-2_14
[44]

doi: 10.1609/aaai.v39i28.35144

ISSN 2374-3468. doi: 10.1609/aaai.v39i28.35144. Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf, May

work page doi:10.1609/aaai.v39i28.35144
[45]

On Generative Agents in Recommenda- tion

An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. On Generative Agents in Recommenda- tion. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, pages 1807–1817, New York, NY, USA, July 2024a. Association for Computing Machinery. ISBN 979-8-4007-0431-4. doi: 10.1145...

work page doi:10.1145/3626772.3657844
[46]

doi: 10.1075/pc.14.2.12zha

ISSN 0929-0907, 1569-9943. doi: 10.1075/pc.14.2.12zha. Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, and Shumin Deng. Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View, May 2024b. Xueqiao Zhang, Chao Zhang, Jianwen Sun, Jun Xiao, Yi Yang, and Yawei Luo. EduPlanner: LLM-Based Multiagent Systems for Customized and Int...

work page doi:10.1075/pc.14.2.12zha
[47]

doi: 10.1109/TLT.2025.3561332

ISSN 1939-1382. doi: 10.1109/TLT.2025.3561332. Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A Survey on the Memory Mechanism of Large Language Model based Agents, April 2024c. Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin Liu, Zh...

work page doi:10.1109/tlt.2025.3561332 1939

[1] [1]

doi: 10.1017/pan.2023.2

ISSN 1047-1987, 1476-4989. doi: 10.1017/pan.2023.2. Berk Atil, Alexa Chittams, Liseng Fu, Ferhan Ture, Lixinyu Xu, and Breck Baldwin. LLM Stability: A detailed analysis with some surprises, September

work page doi:10.1017/pan.2023.2 1987

[2] [2]

Jelle Boers, Terra Etty, Martine Baars, and Kim van Boekhoven

doi: 10.1073/pnas.2314021121. Jelle Boers, Terra Etty, Martine Baars, and Kim van Boekhoven. Exploring cognitive strategies in human-AI interaction: ChatGPT’s role in creative tasks. Journal of Creativity, 35(1):100095, April

work page doi:10.1073/pnas.2314021121

[3] [3]

doi: 10.1016/j.yjoc.2025.100095

ISSN 2713-3745. doi: 10.1016/j.yjoc.2025.100095. Daniil A. Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models, April

work page doi:10.1016/j.yjoc.2025.100095 2025

[4] [4]

Towards the Scalable Evaluation of Cooperativeness in Language Models, March 2023a

Alan Chan, Maxime Riché, and Jesse Clifton. Towards the Scalable Evaluation of Cooperativeness in Language Models, March 2023a. 19 Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, J...

work page doi:10.1145/3593013.3594033 2023

[5] [5]

doi: 10.1038/s44159-023-00241-5

ISSN 2731-0574. doi: 10.1038/s44159-023-00241-5. Edgar A. Duéñez-Guzmán, Suzanne Sadedin, Jane X. Wang, Kevin R. McKee, and Joel Z. Leibo. A social path to human-like artificial intelligence. Nature Machine Intelligence , 5(11):1181–1188, November

work page doi:10.1038/s44159-023-00241-5

[6] [6]

doi: 10.1038/s42256-023-00754-x

ISSN 2522-5839. doi: 10.1038/s42256-023-00754-x. Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, and Iulian Serban. How Teachers Can Use Large Language Models and Bloom’s Taxonomy to Create Educational Quizzes.Proceedings of the AAAI Conference on Artificial Intelligence, 38(21):23084–23091, March

work page doi:10.1038/s42256-023-00754-x

[7] [7]

doi: 10.1609/aaai.v38i21.30353

ISSN 2374-3468. doi: 10.1609/aaai.v38i21.30353. Joshua M. Epstein. Agent-based computational models and generative social science. Complexity, 4(5):41–60, May

work page doi:10.1609/aaai.v38i21.30353

[8] [8]

doi: 10.1002/(SICI)1099-0526(199905/06)4:5<41::AID-CPLX9>3.0.CO;2-F

ISSN 1076-2787, 1099-0526. doi: 10.1002/(SICI)1099-0526(199905/06)4:5<41::AID-CPLX9>3.0.CO;2-F. Joshua M. Epstein. Generative Social Science: Studies in Agent-Based Computational Modeling . Princeton University Press,

work page doi:10.1002/(sici)1099-0526(199905/06)4:5

[9] [9]

ISBN 979-8-4007-0704-9

ACM. ISBN 979-8-4007-0704-9. doi: 10.1145/3670865.3673513. Mehmet Firat and Saniye Kuleli. What if GPT4 Became Autonomous: The Auto-GPT Project and Use Cases. Journal of Emerging Computer Technologies , 3(1):1–6,

work page doi:10.1145/3670865.3673513

[10] [10]

doi: 10.57020/ject.1297961

ISSN 2757-8267. doi: 10.57020/ject.1297961. James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, and Colton Mikolajczyk. Limits of Large Language Models in Debating Humans, February

work page doi:10.57020/ject.1297961

[11] [11]

Replication in Social Science

20 Jeremy Freese and David Peterson. Replication in Social Science. Annual Review of Sociology, 43(Volume 43, 2017): 147–165, July

work page 2017

[12] [12]

S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents

ISSN 0360-0572, 1545-2115. doi: 10.1146/annurev-soc-060116-053450. Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. S3: Social-network Simulation System with Large Language Model-Empowered Agents. https://arxiv.org/abs/2307.14984v2, July

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1146/annurev-soc-060116-053450

[13] [13]

doi: 10.1057/s41599-024-03611-3

ISSN 2662-9992. doi: 10.1057/s41599-024-03611-3. Juraj Gottweis and Vivek Natarajan. Accelerating scientific breakthroughs with an AI co-scientist. https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/, February

work page doi:10.1057/s41599-024-03611-3

[14] [14]

doi: 10.1126/science.adi1778

ISSN 0036-8075, 1095-9203. doi: 10.1126/science.adi1778. Guidelines are urgently needed. The AI writing on the wall. Nature Machine Intelligence, 5(1):1–1, January

work page doi:10.1126/science.adi1778

[15] [15]

doi: 10.1038/s42256-023-00613-9

ISSN 2522-5839. doi: 10.1038/s42256-023-00613-9. Jennifer Haase and Paul H. P. Hanel. Artificial muses: Generative artificial intelligence chatbots have risen to human-level creativity. Journal of Creativity, 33(3):1–7,

work page doi:10.1038/s42256-023-00613-9

[16] [16]

F., & Oliver, L

ISSN 2713-3745. doi: 10.1016/j.yjoc.2023.100066. Jennifer Haase and Sebastian Pokutta. Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration, November

work page doi:10.1016/j.yjoc.2023.100066 2023

[17] [17]

doi: 10.30844/aistes.v4i1.17

ISSN 1867-7134. doi: 10.30844/aistes.v4i1.17. Jennifer Haase, Paul H. P. Hanel, and Sebastian Pokutta. Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability, April 2025a. Jennifer Haase, Paul H. P. Hanel, and Sebastian Pokutta. S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Asse...

work page doi:10.30844/aistes.v4i1.17

[18] [18]

doi: 10.1177/2057150X241306780

ISSN 2057-150X. doi: 10.1177/2057150X241306780. Muhua Huang, Xijuan Zhang, Christopher Soto, and James Evans. Designing LLM-Agents with Personalities: A Psychometric Approach, October 2024a. Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of LLM agents: A surve...

work page doi:10.1177/2057150x241306780 2057

[19] [19]

doi: 10.1038/s41746-024-01422-z

ISSN 2398-6352. doi: 10.1038/s41746-024-01422-z. Jonathan Kantor. Best practices for implementing ChatGPT, large language models, and artificial intelligence in qualitative and survey-based research. JAAD International, 14:22–23, March

work page doi:10.1038/s41746-024-01422-z

[20] [20]

doi: 10.1016/j.jdin.2023.10.001

ISSN 2666-3287, 2666-3287. doi: 10.1016/j.jdin.2023.10.001. Andres Karjus. Machine-assisted quantitizing designs: Augmenting humanities and social sciences with artificial intelligence. Humanities and Social Sciences Communications , 12(1):1–18, February

work page doi:10.1016/j.jdin.2023.10.001 2023

[21] [21]

doi: 10.1057/s41599-025-04503-w

ISSN 2662-9992. doi: 10.1057/s41599-025-04503-w. Luoma Ke, Song Tong, Peng Cheng, and Kaiping Peng. Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review,

work page doi:10.1057/s41599-025-04503-w

[22] [22]

doi: 10.1016/j.psychres.2023.115667

ISSN 0165-1781. doi: 10.1016/j.psychres.2023.115667. Brice Valentin Kok-Shun, Johnny Chan, and Gabrielle Peko. Intertwining Two Artificial Minds: Chaining GPT and RoBERTa for Emotion Detection. In 2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE),

work page doi:10.1016/j.psychres.2023.115667 2023

[23] [23]

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. Advances in Neural Information Processing Systems, 36:51991–52008, December 2023a. Jiazheng Li, Artem Bobrov, David West, Cesare Aloisi, and Yulan He. An Automated Explainable Educational Assessm...

work page doi:10.1609/aaai.v39i28.35358

[24] [24]

Liang Liu, Dong Zhang, Shoushan Li, Guodong Zhou, and Erik Cambria

doi: 10.1098/rsos.240682. Liang Liu, Dong Zhang, Shoushan Li, Guodong Zhou, and Erik Cambria. Two Heads are Better than One: Zero-shot Cognitive Reasoning via Multi-LLM Knowledge Fusion. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , CIKM ’24, pages 1462–1472, New York, NY, USA, October

work page doi:10.1098/rsos.240682

[25] [25]

ISBN 979-8-4007-0436-9

Association for Computing Machinery. ISBN 979-8-4007-0436-9. doi: 10.1145/3627673.3679744. Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems , volume

work page doi:10.1145/3627673.3679744

[26] [26]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, sep 2024a. URL http://arxiv.org/abs/2408.06292. Yikang Lu, Alberto Aleta, Chunpeng Du, Lei Shi, and Yamir Moreno. LLMs and generative agent-based models for complex systems research. Physics of Life R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.plrev.2024.10.013 2024

[27] [27]

doi: 10.1080/ 14626268.2022.2082486

ISSN 1462-6268. doi: 10.1080/ 14626268.2022.2082486. Konstantinos Mitsopoulos, Ritwik Bose, Brodie Mather, Archna Bhatia, Kevin Gluck, Bonnie Dorr, Christian Lebiere, and Peter Pirolli. Psychologically-Valid Generative Agents: A Novel Approach to Agent-Based Modeling in Social Sciences. Proceedings of the AAAI Symposium Series , 2(1):340–348,

work page arXiv 2022

[28] [28]

doi: 10.1609/aaaiss.v2i1.27698

ISSN 2994-4317. doi: 10.1609/aaaiss.v2i1.27698. Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, and Ilya Makarov. The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games, June

work page doi:10.1609/aaaiss.v2i1.27698

[29] [29]

Kelvin M

doi: 10.1109/KMN.2002.1115175. Kelvin M. Mwita. Strengths and weaknesses of qualitative research in social science studies. International Journal of Research in Business and Social Science , 11(6):618–625,

work page doi:10.1109/kmn.2002.1115175 2002

[30] [30]

doi: 10.1037/pspp0000544

ISSN 1939-1315. doi: 10.1037/pspp0000544. Qian Niu, Junyu Liu, Ziqian Bi, Pohsun Feng, Benji Peng, Keyu Chen, Ming Li, Lawrence KQ Yan, Yichao Zhang, Caitlyn Heqi Yin, Cheng Fei, Tianyang Wang, Yunze Wang, Silin Chen, and Ming Liu. Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges, December

work page doi:10.1037/pspp0000544 1939

[31] [31]

doi: 10.1016/j.tsc.2023.101356

ISSN 1871-1871. doi: 10.1016/j.tsc.2023.101356. Frans PB Osinga. Science, Strategy and War: The Strategic Theory of John Boyd . Routledge,

work page doi:10.1016/j.tsc.2023.101356 2023

[32] [32]

O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Association for Computing Machinery. ISBN 979-8-4007-0132-0. doi: 10.1145/3586183.3606763. 23 Taejin Park. Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework, March

work page doi:10.1145/3586183.3606763

[33] [33]

doi: 10.1057/s41599-024-03609-x

ISSN 2662-9992. doi: 10.1057/s41599-024-03609-x. Mikko Rask and Koki Shimizu. Beyond the Average: Exploring the Potential and Challenges of Large Language Models in Social Science Research. In 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), pages 1–5, February

work page doi:10.1057/s41599-024-03609-x 2024

[34] [34]

John Roberts, Max Baker, and Jane Andrew

doi: 10.1109/ACDSA59508.2024.10467341. John Roberts, Max Baker, and Jane Andrew. Artificial intelligence and qualitative research: The promise and perils of large language model (LLM) ‘assistance’.Critical Perspectives on Accounting , 99:102722, March

work page doi:10.1109/acdsa59508.2024.10467341 2024

[35] [35]

doi: 10.1016/j.cpa.2024.102722

ISSN 1045-2354. doi: 10.1016/j.cpa.2024.102722. Luca Rossi, Katherine Harrison, and Irina Shklovski. The Problems of LLM-generated Data in Social Science Research. Sociologica, 18(2):145–168,

work page doi:10.1016/j.cpa.2024.102722 2024

[36] [36]

doi: 10.1002/mar.21982

ISSN 1520-6793. doi: 10.1002/mar.21982. Eric Hal Schwartz. Google’s Gemini AI Is now a Pokémon Master, May

work page doi:10.1002/mar.21982

[37] [37]

doi: 10.1038/s42256-024-00800-2

ISSN 2522-5839. doi: 10.1038/s42256-024-00800-2. Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Chan Jun Shern, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, and Tejal Patwardhan. PaperBench: Evaluating AI’s Ability to Replicate AI Research,

work page doi:10.1038/s42256-024-00800-2

[38] [38]

doi: 10.1007/s13278-025-01428-9

ISSN 1869-5469. doi: 10.1007/s13278-025-01428-9. Lindia Tjuatja, Valerie Chen, Tongshuang Wu, Ameet Talwalkwar, and Graham Neubig. Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design. Transactions of the Association for Computational Linguistics, 12:1011–1026, September

work page doi:10.1007/s13278-025-01428-9

[39] [39]

doi: 10.1162/tacl_a_00685

ISSN 2307-387X. doi: 10.1162/tacl_a_00685. Daniel Valdenegro. A LLM digest for social scientist,

work page doi:10.1162/tacl_a_00685

[40] [40]

doi: 10.1093/scipol/scae070

ISSN 0302-3427. doi: 10.1093/scipol/scae070. Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Assessing Bias in LLM-Generated Synthetic Datasets: The Case of German Voter Behavior,

work page doi:10.1093/scipol/scae070

[41] [41]

Dickerson

Angelina Wang, Jamie Morgenstern, and John P. Dickerson. Large language models that replace human partici- pants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3):400–411, March 2025a. ISSN 2522-5839. doi: 10.1038/s42256-025-00986-z. Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, J...

work page doi:10.1038/s42256-025-00986-z

[42] [42]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, March 2024

ISSN 2095-2236. doi: 10.1007/s11704-024- 40231-1. Lei Wang, Zheqing Zhang, and Xu Chen. Investigating and Extending Homans’ Social Exchange Theory with Large Language Model based Agents, February 2025b. Ruiqi Wang, Jiyu Guo, Cuiyun Gao, Guodong Fan, Chun Yong Chong, and Xin Xia. Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in So...

work page doi:10.1007/s11704-024- 2095

[43] [43]

ISBN 978-981-97-7232-2

Springer Nature. ISBN 978-981-97-7232-2. doi: 10.1007/978-981-97-7232-2_14. Tianlong Xu, YiFan Zhang, Zhendong Chu, Shen Wang, and Qingsong Wen. AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction. Proceedings of the AAAI Conference on Artificial Intelligence , 39(28)...

work page doi:10.1007/978-981-97-7232-2_14

[44] [44]

doi: 10.1609/aaai.v39i28.35144

ISSN 2374-3468. doi: 10.1609/aaai.v39i28.35144. Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf, May

work page doi:10.1609/aaai.v39i28.35144

[45] [45]

On Generative Agents in Recommenda- tion

An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. On Generative Agents in Recommenda- tion. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, pages 1807–1817, New York, NY, USA, July 2024a. Association for Computing Machinery. ISBN 979-8-4007-0431-4. doi: 10.1145...

work page doi:10.1145/3626772.3657844

[46] [46]

doi: 10.1075/pc.14.2.12zha

ISSN 0929-0907, 1569-9943. doi: 10.1075/pc.14.2.12zha. Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, and Shumin Deng. Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View, May 2024b. Xueqiao Zhang, Chao Zhang, Jianwen Sun, Jun Xiao, Yi Yang, and Yawei Luo. EduPlanner: LLM-Based Multiagent Systems for Customized and Int...

work page doi:10.1075/pc.14.2.12zha

[47] [47]

doi: 10.1109/TLT.2025.3561332

ISSN 1939-1382. doi: 10.1109/TLT.2025.3561332. Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A Survey on the Memory Mechanism of Large Language Model based Agents, April 2024c. Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin Liu, Zh...

work page doi:10.1109/tlt.2025.3561332 1939