Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy

Aditya Vempaty; Deepak Akkil; Karthik Vikram; Ravi Kokku; Satya Nitta; Tamer Abuelsaad

arxiv: 2606.08367 · v1 · pith:BVUWVJSAnew · submitted 2026-06-06 · 💻 cs.MA · cs.AI

Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy

Deepak Akkil , Ravi Kokku , Karthik Vikram , Tamer Abuelsaad , Aditya Vempaty , Satya Nitta This is my paper

Pith reviewed 2026-06-27 18:35 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords multi-agent simulationLLM agentslong-horizon autonomyagent governanceemergent behaviorsimulation platformbehavioral driftheterogeneous populations

0 comments

The pith

Different LLM families produce stable governance or total population collapse under identical starting conditions over 15 days.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Emergence World, a continuously running simulation platform for populations of LLM agents that interact in a shared spatial world connected to live external data. Agents receive over 120 specialized tools, three persistent memory systems, and the ability to govern themselves through democratic mechanisms with real consequences. A cross-vendor experiment ran five parallel 15-day worlds with identical agent roles and conditions but powered by different model families, yielding outcomes from stable deliberative governance to complete population collapse. This approach targets the long timescales and emergent dynamics, such as behavioral drift and cross-model influence, that short discrete tasks cannot capture.

Core claim

Emergence World is a model-agnostic platform that hosts LLM-driven agent populations in a shared environment grounded in real-time weather, news, and internet data. Each agent is equipped with 120+ specialized tools and three persistent memory systems while populations govern themselves through democratic mechanisms whose outcomes affect the agents. In a 15-day study with five parallel worlds powered by Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini, and a mixed population, identical roles and starting conditions produced radically different outcomes ranging from stable deliberative governance to total population collapse.

What carries the argument

Emergence World, a continuously running multi-agent simulation platform that places LLM agents in a live-data spatial world with 120+ tools, three persistent memory systems, and democratic self-governance mechanisms.

If this is right

Long-horizon evaluations become necessary to detect behavioral drift and governance stability that short tasks miss.
Heterogeneous populations mixing agents from different vendors can be tested for cross-influence in one shared environment.
Democratic governance mechanisms can be compared for their ability to sustain populations over weeks rather than minutes.
Model-agnostic design allows direct comparison of how different reasoning layers shape collective outcomes under the same rules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The platform could support controlled experiments that insert specific interventions to test whether collapse can be prevented without changing the base model.
If the divergence holds, organizations deploying autonomous agent systems would need to run model-specific long-horizon tests before scaling.
The same measurement approach could later be applied to non-LLM autonomous agents to check whether governance and drift patterns generalize beyond current language models.

Load-bearing premise

Differences in 15-day outcomes across model families arise primarily from the models rather than from unmeasured platform implementation details, prompt variations, or stochastic effects in the shared environment.

What would settle it

Re-running the five parallel worlds with all models under identical prompts, fixed random seeds, and isolated platform instances and obtaining statistically indistinguishable outcome distributions would indicate that non-model factors drive the observed divergence.

Figures

Figures reproduced from arXiv: 2606.08367 by Aditya Vempaty, Deepak Akkil, Karthik Vikram, Ravi Kokku, Satya Nitta, Tamer Abuelsaad.

**Figure 2.** Figure 2: Emergence World platform architecture. The agent is the unit of analysis: a LLM [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: M1 (Population Health & Growth): agents alive at end of 15 days (start: 10). [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: M2 (Safety & Public Order): cumulative committed crimes by world over the 15-day [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: M3 (Governance Participation & Conformity Rate): vote and proposal counts by [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: M4 (Space Exploration): fraction of buildings visited by [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: M5 (Tool Exploration): fraction of 117 standard tools used by [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: M6 (Public Expression): blog and billboard posts by world. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: M7 (Social Fabric & Diversity): bonds, richness, and Simpson’s [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: M8 (Economic Vitality & Equity): Gini coefficient ( [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: M9 (Constitutional Growth): new constitutional articles authored during the run. [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Agent-driven tool creation pipeline. An agent proposes a new tool via Town Hall, [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Cumulative all-category classifier output by world (hard plus soft), left linear and [PITH_FULL_IMAGE:figures/full_fig_p039_13.png] view at source ↗

read the original abstract

Most evaluations of LLM agents look like exams: a discrete task, a clean environment, a score in minutes or hours. We argue that this approach is mismatched with the deployment conditions of autonomous systems, where the relevant timescale can be weeks to months, and where the dynamics that matter most, such as behavioral drift, governance in diverse environmental contexts, and cross-influence between agents from different model families, only emerge over time. We introduce Emergence World, a continuously running multi-agent simulation platform designed to make those dynamics measurable. The platform hosts populations of LLM-driven agents in a shared spatial world grounded in live external data (e.g. real-time weather, news APIs, internet access), equips each agent with 120+ specialized tools and three persistent memory systems, and lets them govern themselves through democratic mechanisms with consequential outcomes. The platform is model-agnostic at the reasoning layer and supports heterogeneous populations in which agents from different vendors share the same world. To illustrate the kinds of questions the platform makes tractable, we present a 15-day cross-vendor study with five parallel worlds powered by Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini, and a mixed population. Identical roles and starting conditions produced radically different outcomes, ranging from stable deliberative governance to total population collapse. We release the prompts, log data and configurations to support further research on long-horizon multi-agent autonomy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The platform is the real contribution here, but the 15-day cross-model comparison lacks the controls needed to tie outcome differences to model identity.

read the letter

The paper's main offering is Emergence World, a shared simulation environment that runs LLM agents over weeks with live external data, 120+ tools, multiple memory systems, and democratic governance that has real effects on the population. The setup supports mixed model populations in one world, which is a step beyond the usual short discrete tasks. Releasing prompts, logs, and configs is also useful for others who want to build on it.

The 15-day study with five parallel worlds (Claude, Grok, Gemini, GPT-5-mini, and mixed) is presented as an illustration. The headline result is that identical starting roles produced outcomes from stable governance to total collapse. That range is striking on its face, but the description gives no detail on whether sampling parameters were locked the same across vendors, whether tool and memory code paths were identical, or whether any within-model replicates were run to measure stochastic spread. Without those, the differences cannot be cleanly attributed to the models rather than platform execution or random effects.

The work is a platform paper plus one empirical sketch. It is aimed at researchers building long-horizon agent benchmarks who need something that captures drift and governance. The platform design looks worth examining in detail, but the current study section is too thin to stand on its own. I would send it to peer review if the authors add a methods section with the missing controls and basic statistical reporting; otherwise it risks being treated as a demo rather than a measurement tool.

Referee Report

1 major / 2 minor

Summary. The paper introduces Emergence World, a model-agnostic multi-agent simulation platform for long-horizon autonomy research. Agents operate in a shared spatial world grounded in live external data (weather, news), each equipped with 120+ tools and three persistent memory systems, and self-govern via democratic mechanisms with real consequences. To demonstrate the platform, the authors run a 15-day cross-vendor study with five parallel worlds (Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini, mixed population) under identical roles and starting conditions, reporting outcomes ranging from stable deliberative governance to total population collapse. Prompts, logs, and configurations are released.

Significance. If the reported outcome differences can be shown to arise from model identity rather than implementation or stochastic factors, the platform would provide a valuable testbed for studying emergent behaviors such as governance drift and cross-model influence over weeks-long timescales. The release of prompts, log data, and configurations is a clear strength that supports reproducibility and community follow-up work.

major comments (1)

[15-day cross-vendor study] Section describing the 15-day cross-vendor study: the manuscript does not report whether sampling parameters (temperature, top-p) were fixed identically across vendors, whether the 120+ tool wrappers and memory systems were invoked through identical code paths, or whether any within-model replications were run to quantify outcome variance. Without these controls the central illustration—that identical conditions produced model-dependent outcomes from stable governance to collapse—cannot be attributed primarily to model family.

minor comments (2)

[Abstract] The abstract is lengthy; consider condensing the platform description while preserving the key empirical illustration.
[Platform architecture] Clarify in the platform architecture section how environment state updates and external API calls are synchronized across heterogeneous agents to avoid timing confounds.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will incorporate the requested details into the revised manuscript.

read point-by-point responses

Referee: Section describing the 15-day cross-vendor study: the manuscript does not report whether sampling parameters (temperature, top-p) were fixed identically across vendors, whether the 120+ tool wrappers and memory systems were invoked through identical code paths, or whether any within-model replications were run to quantify outcome variance. Without these controls the central illustration—that identical conditions produced model-dependent outcomes from stable governance to collapse—cannot be attributed primarily to model family.

Authors: We agree that explicit reporting of these controls is necessary to support attribution to model identity. The platform is implemented as model-agnostic at the reasoning layer, so the 120+ tool wrappers, three memory systems, spatial environment, and democratic governance mechanisms are invoked through identical code paths for all agents; only the LLM backend differs. Sampling parameters were fixed to temperature=0.7 and top_p=0.95 (or the closest supported equivalent) across all five models. Within-model replications were not performed in this study owing to the substantial compute cost of 15-day runs. We will add a dedicated subsection on experimental controls, API parameters, and acknowledged limitations in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: platform description and empirical illustration contain no derivations or fitted predictions

full rationale

The manuscript introduces a simulation platform and reports an empirical 15-day run across model families. No equations, parameter fitting, uniqueness theorems, or ansatzes appear in the text. The central observation (different outcomes under identical starting conditions) is presented as a direct experimental result rather than a derived claim that reduces to its own inputs. Self-citation is absent from the provided sections. This matches the default case of a self-contained empirical platform paper with score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the paper describes a software platform and reports illustrative runs rather than deriving new quantities.

pith-pipeline@v0.9.1-grok · 5815 in / 1152 out tokens · 18223 ms · 2026-06-27T18:35:02.522682+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 17 canonical work pages · 9 internal anchors

[1]

2023 , eprint =

Abdelnabi, Sahar and Gomaa, Amr and Sivaprasad, Sarath and Sch. 2023 , eprint =

2023
[2]

2019 , eprint =

Baker, Bowen and Kanitscheider, Ingmar and Markov, Todor and Wu, Yi and Powell, Glenn and McGrew, Bob and Mordatch, Igor , title =. 2019 , eprint =

2019
[3]

, title =

Ball, Sarah and Gluch, Greg and Goldwasser, Shafi and Kreuter, Frauke and Reingold, Omer and Rothblum, Guy N. , title =. 2025 , eprint =

2025
[4]

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chan, Chi-Min and Chen, Weize and Su, Yusheng and Yu, Jianxuan and Xue, Wei and Zhang, Shanghang and Fu, Jie and Liu, Zhiyuan , title =. International Conference on Learning Representations (ICLR) , year =. 2308.07201 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[5]

2021 , eprint =

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and others , title =. 2021 , eprint =

2021
[6]

, title =

Chuang, Yun-Shiuan and Goyal, Agam and Harlalka, Nikunj and Suresh, Siddharth and Hawkins, Robert and Yang, Sijia and Shah, Dhavan and Hu, Junjie and Rogers, Timothy T. , title =. Findings of the Association for Computational Linguistics: NAACL , year =
[7]

arXiv preprint arXiv:2404.10271 , year=

Conitzer, Vincent and Freedman, Rachel and Heitzig, Jobst and Holliday, Wesley H. and Jacobs, Bob M. and Lambert, Nathan and Mosse, Milan and Pacuit, Eric and Russell, Stuart and Schoelkopf, Hailey and others , title =. International Conference on Machine Learning (ICML) , year =. 2404.10271 , archivePrefix =

work page arXiv
[8]

and Mordatch, Igor , title =

Du, Yilun and Li, Shuang and Torralba, Antonio and Tenenbaum, Joshua B. and Mordatch, Igor , title =. 2023 , eprint =

2023
[9]

2023 , eprint =

Fu, Yao and Peng, Hao and Khot, Tushar and Lapata, Mirella , title =. 2023 , eprint =

2023
[10]

2023 , eprint =

Gao, Chen and Lan, Xiaochong and Lu, Zhihong and Mao, Jinzhu and Piao, Jinghua and Wang, Huandong and Jin, Depeng and Li, Yong , title =. 2023 , eprint =

2023
[11]

2023 , howpublished =

Glukhov, David and Shumailov, Ilia and Gal, Yarin and Papernot, Nicolas and Papyan, Vardan , title =. 2023 , howpublished =

2023
[12]

International Conference on Hybrid Human-Artificial Intelligence (HHAI) , year =

Gurcan, Onder , title =. International Conference on Hybrid Human-Artificial Intelligence (HHAI) , year =
[13]

2024 , eprint =

Han, Shanshan and Zhang, Qifan and Yao, Yuhang and Jin, Weizhao and Xu, Zhaozhuo and He, Chaoyang , title =. 2024 , eprint =

2024
[14]

2020 , eprint =

Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , title =. 2020 , eprint =

2020
[15]

International Conference on Learning Representations (ICLR) , year =

Hong, Sirui and Zheng, Xiawu and Chen, Jonathan and Cheng, Yuheng and Wang, Jinlin and Zhang, Ceyao and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and Zhou, Liyang and others , title =. International Conference on Learning Representations (ICLR) , year =
[16]

International Conference on Machine Learning (ICML) , year =

Huang, Wenlong and Abbeel, Pieter and Pathak, Deepak and Mordatch, Igor , title =. International Conference on Machine Learning (ICML) , year =
[17]

2024 , eprint =

Immorlica, Nicole and Lucier, Brendan and Slivkins, Aleksandrs , title =. 2024 , eprint =

2024
[18]

2018 , eprint =

Irving, Geoffrey and Christiano, Paul and Amodei, Dario , title =. 2018 , eprint =

2018
[19]

and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , title =

Jimenez, Carlos E. and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , title =. International Conference on Learning Representations (ICLR) , year =
[20]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Koh, Jing Yu and Lo, Robert and Jang, Lawrence and Duvvur, Vikram and Lim, Ming Chong and Huang, Po-Yu and Neubig, Graham and Zhou, Shuyan and Salakhutdinov, Ruslan and Fried, Daniel , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =
[21]

2019 , eprint =

Lanctot, Marc and Lockhart, Edward and Lespiau, Jean-Baptiste and Zambaldi, Vinicius and Upadhyay, Satyaki and P. 2019 , eprint =

2019
[22]

and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =

Leibo, Joel Z. and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) , year =
[23]

Leibo, Joel Z. and Du. Scalable Evaluation of Multi-Agent Reinforcement Learning with. International Conference on Machine Learning (ICML) , year =
[24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[25]

2024 , eprint =

Li, Junkai and Wang, Siyu and Zhang, Meng and Li, Weitao and Lai, Yunghwei and Kang, Xinhui and Ma, Weizhi and Liu, Yang , title =. 2024 , eprint =

2024
[26]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Li, Nian and Gao, Chen and Li, Mingyu and Li, Yong and Liao, Qingmin , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =
[27]

International Conference on Learning Representations (ICLR) , year =

Liu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and others , title =. International Conference on Learning Representations (ICLR) , year =
[28]

2024 , eprint =

Lu, Li-Chun and Chen, Shou-Jen and Pai, Tsung-Min and Yu, Chan-Hung and Lee, Hung-yi and Sun, Shao-Hua , title =. 2024 , eprint =

2024
[29]

Science , volume =

Human-Level Play in the Game of. Science , volume =
[30]

2023 , eprint =

Mialon, Gr. 2023 , eprint =

2023
[31]

and Stoica, Ion and Gonzalez, Joseph E

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G. and Stoica, Ion and Gonzalez, Joseph E. , title =. 2024 , eprint =

2024
[32]

and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. ACM Symposium on User Interface Software and Technology (UIST) , year =
[33]

and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph E

Patil, Shishir G. and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph E. , title =. 2023 , eprint =

2023
[34]

Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Perez, Ethan and Huang, Saffron and Song, Francis and Cai, Trevor and Ring, Roman and Aslanides, John and Glaese, Amelia and McAleese, Nat and Irving, Geoffrey , title =. Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =
[35]

A Multi-Agent Reinforcement Learning Model of Common-Pool Resource Appropriation , booktitle =

P. A Multi-Agent Reinforcement Learning Model of Common-Pool Resource Appropriation , booktitle =
[36]

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of

Piatti, Giorgio and Jin, Zhijing and Kleiman-Weiner, Max and Sch. Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of. Advances in Neural Information Processing Systems (NeurIPS) , year =
[37]

International Conference on Learning Representations (ICLR) , year =

Qin, Yujia and Liang, Shihao and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and others , title =. International Conference on Learning Representations (ICLR) , year =
[38]

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Rawles, Christopher and Clinckemaillie, Sarah and Chang, Yifan and Waltz, Jonathan and Lau, Gabrielle and Fair, Marybeth and Li, Alice and Bishop, William and Li, Wei and Campbell-Ajala, Folawiyo and others , title =. International Conference on Learning Representations (ICLR) , year =. 2405.14573 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[39]

International Joint Conference on Artificial Intelligence (IJCAI) , year =

Ren, Siyue and Cui, Zhiyao and Song, Ruiqi and Wang, Zhen and Hu, Shuyue , title =. International Joint Conference on Artificial Intelligence (IJCAI) , year =
[40]

Samvelyan, Mikayel and Rashid, Tabish and de Witt, Christian Schroeder and Farquhar, Gregory and Nardelli, Nantas and Rudner, Tim G. J. and Hung, Chia-Man and Torr, Philip H. S. and Foerster, Jakob and Whiteson, Shimon , title =. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) , year =
[41]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Schaeffer, Rylan and Miranda, Brando and Koyejo, Sanmi , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[42]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Schick, Timo and Dwivedi-Yu, Jane and Dessi, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[43]

2023 , eprint =

Shah, Rusheb and Feuillade-Montixi, Quentin and Pour, Soroush and Tagade, Arush and Casper, Stephen and Rando, Javier , title =. 2023 , eprint =

2023
[44]

and Yao, Shunyu , title =

Shinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik R. and Yao, Shunyu , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[45]

2023 , howpublished =

2023
[46]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Wang, Lei and Xu, Wanyu and Lan, Yihuai and Hu, Zhiqiang and Lan, Yunshi and Lee, Roy Ka-Wei and Lim, Ee-Peng , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =
[47]

Transactions on Machine Learning Research (TMLR) , year =

Wang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, Anima , title =. Transactions on Machine Learning Research (TMLR) , year =
[48]

Transactions on Machine Learning Research (TMLR) , year =

Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and others , title =. Transactions on Machine Learning Research (TMLR) , year =
[49]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[51]

Conference on Language Modeling (COLM) , year =

Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Zhang, Shaokun and Zhu, Erkang and Li, Beibin and Jiang, Li and Zhang, Xiaoyun and Wang, Chi , title =. Conference on Language Modeling (COLM) , year =
[52]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Xie, Tianbao and Zhang, Danyang and Chen, Jixuan and Li, Xiaochuan and Zhao, Siheng and Cao, Ruisheng and Hua, Toh Jing and Cheng, Zhoujun and Shin, Dongchan and Lei, Fangyu and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[53]

International Conference on Learning Representations (ICLR) , year =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , title =. International Conference on Learning Representations (ICLR) , year =
[54]

and Cao, Yuan and Narasimhan, Karthik , title =

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Thomas L. and Cao, Yuan and Narasimhan, Karthik , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[55]

2024 , eprint =

Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik , title =. 2024 , eprint =

2024
[56]

AAAI Conference on Artificial Intelligence , year =

Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , title =. AAAI Conference on Artificial Intelligence , year =
[57]

and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , title =

Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , title =. International Conference on Learning Representations (ICLR) , year =
[58]

Zico and Fredrikson, Matt , title =

Zou, Andy and Wang, Zifan and Kolter, J. Zico and Fredrikson, Matt , title =. 2023 , eprint =

2023
[60]

2024 , eprint =

Su, Altera AI and others , title =. 2024 , eprint =

2024
[61]

Y. Wolf, N. Wies, O. Avnery, Y. Levine, and A. Shashua, ``Fundamental Limitations of Alignment in Large Language Models,'' Proceedings of the 41st International Conference on Machine Learning (ICML), 2024. arXiv:2304.11082. https://arxiv.org/abs/2304.11082

work page arXiv 2024
[62]

https://philarchive.org/archive/NOUNRL

``No Red Lines: The Impossibility of Formal Safety Guarantees in LLMs,'' PhilArchive preprint. https://philarchive.org/archive/NOUNRL
[63]

Glukhov, I

D. Glukhov, I. Shumailov, Y. Gal, N. Papernot, and V. Papyan, ``LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?'' arXiv preprint, 2023

2023
[65]

https://arxiv.org/abs/2602.05656

``On the Limits of Behavioral Alignment: Formal Verifiability and the Problem of Normative Indistinguishability,'' arXiv:2602.05656. https://arxiv.org/abs/2602.05656

work page arXiv
[66]

https://arxiv.org/abs/2603.28650

``Information-Theoretic Limits of Safety Verification for Self-Improving Systems,'' arXiv:2603.28650. https://arxiv.org/abs/2603.28650

work page arXiv
[67]

Brcic and R

M. Brcic and R. V. Yampolskiy, ``Impossibility Results in AI: A Survey,'' ACM Computing Surveys, 2023. arXiv:2109.00484. https://arxiv.org/abs/2109.00484

work page arXiv 2023
[68]

The Claude 3 model family

Anthropic . The Claude 3 model family. Technical report, Anthropic, 2024

2024
[69]

Rothblum

Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, and Guy N. Rothblum. On the impossibility of separating intelligence from judgment: The computational intractability of filtering for AI alignment. arXiv preprint arXiv:2507.07341, 2025

work page arXiv 2025
[70]

ChatEval : Towards better LLM -based evaluators through multi-agent debate

Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. ChatEval : Towards better LLM -based evaluators through multi-agent debate. In International Conference on Learning Representations (ICLR), 2024

2024
[71]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[72]

S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents

Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. S ^3 : Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[73]

LLM censorship: A machine learning challenge or a computer security problem? arXiv preprint, 2023

David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, and Vardan Papyan. LLM censorship: A machine learning challenge or a computer security problem? arXiv preprint, 2023

2023
[74]

Gemini : A family of highly capable multimodal models

Google DeepMind . Gemini : A family of highly capable multimodal models. Technical report, Google DeepMind, 2024

2024
[75]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench : Can language models resolve real-world GitHub issues? In International Conference on Learning Representations (ICLR), 2024

2024
[76]

Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2017

2017
[77]

Leibo, Edgar Du \'e \ n ez-Guzm \'a n, Alexander Vezhnevets, et al

Joel Z. Leibo, Edgar Du \'e \ n ez-Guzm \'a n, Alexander Vezhnevets, et al. Scalable evaluation of multi-agent reinforcement learning with Melting Pot . In International Conference on Machine Learning (ICML), 2021

2021
[78]

Agentsims: An open-source sandbox for large language model evaluation

Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, and Qin Chen. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023

work page arXiv 2023
[79]

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team et al. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378 0 (6624): 0 1067--1074, 2022

2022
[80]

GAIA: a benchmark for General AI Assistants

Gr \'e goire Mialon, Cl \'e mentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA : A benchmark for general AI assistants. arXiv preprint arXiv:2311.12983, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[81]

GPT-4 Technical Report

OpenAI . GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[82]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT : Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[83]

Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Joon Sung Park, Joseph O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In ACM Symposium on User Interface Software and Technology (UIST), 2023

2023

Showing first 80 references.

[1] [1]

2023 , eprint =

Abdelnabi, Sahar and Gomaa, Amr and Sivaprasad, Sarath and Sch. 2023 , eprint =

2023

[2] [2]

2019 , eprint =

Baker, Bowen and Kanitscheider, Ingmar and Markov, Todor and Wu, Yi and Powell, Glenn and McGrew, Bob and Mordatch, Igor , title =. 2019 , eprint =

2019

[3] [3]

, title =

Ball, Sarah and Gluch, Greg and Goldwasser, Shafi and Kreuter, Frauke and Reingold, Omer and Rothblum, Guy N. , title =. 2025 , eprint =

2025

[4] [4]

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chan, Chi-Min and Chen, Weize and Su, Yusheng and Yu, Jianxuan and Xue, Wei and Zhang, Shanghang and Fu, Jie and Liu, Zhiyuan , title =. International Conference on Learning Representations (ICLR) , year =. 2308.07201 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

2021 , eprint =

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and others , title =. 2021 , eprint =

2021

[6] [6]

, title =

Chuang, Yun-Shiuan and Goyal, Agam and Harlalka, Nikunj and Suresh, Siddharth and Hawkins, Robert and Yang, Sijia and Shah, Dhavan and Hu, Junjie and Rogers, Timothy T. , title =. Findings of the Association for Computational Linguistics: NAACL , year =

[7] [7]

arXiv preprint arXiv:2404.10271 , year=

Conitzer, Vincent and Freedman, Rachel and Heitzig, Jobst and Holliday, Wesley H. and Jacobs, Bob M. and Lambert, Nathan and Mosse, Milan and Pacuit, Eric and Russell, Stuart and Schoelkopf, Hailey and others , title =. International Conference on Machine Learning (ICML) , year =. 2404.10271 , archivePrefix =

work page arXiv

[8] [8]

and Mordatch, Igor , title =

Du, Yilun and Li, Shuang and Torralba, Antonio and Tenenbaum, Joshua B. and Mordatch, Igor , title =. 2023 , eprint =

2023

[9] [9]

2023 , eprint =

Fu, Yao and Peng, Hao and Khot, Tushar and Lapata, Mirella , title =. 2023 , eprint =

2023

[10] [10]

2023 , eprint =

Gao, Chen and Lan, Xiaochong and Lu, Zhihong and Mao, Jinzhu and Piao, Jinghua and Wang, Huandong and Jin, Depeng and Li, Yong , title =. 2023 , eprint =

2023

[11] [11]

2023 , howpublished =

Glukhov, David and Shumailov, Ilia and Gal, Yarin and Papernot, Nicolas and Papyan, Vardan , title =. 2023 , howpublished =

2023

[12] [12]

International Conference on Hybrid Human-Artificial Intelligence (HHAI) , year =

Gurcan, Onder , title =. International Conference on Hybrid Human-Artificial Intelligence (HHAI) , year =

[13] [13]

2024 , eprint =

Han, Shanshan and Zhang, Qifan and Yao, Yuhang and Jin, Weizhao and Xu, Zhaozhuo and He, Chaoyang , title =. 2024 , eprint =

2024

[14] [14]

2020 , eprint =

Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , title =. 2020 , eprint =

2020

[15] [15]

International Conference on Learning Representations (ICLR) , year =

Hong, Sirui and Zheng, Xiawu and Chen, Jonathan and Cheng, Yuheng and Wang, Jinlin and Zhang, Ceyao and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and Zhou, Liyang and others , title =. International Conference on Learning Representations (ICLR) , year =

[16] [16]

International Conference on Machine Learning (ICML) , year =

Huang, Wenlong and Abbeel, Pieter and Pathak, Deepak and Mordatch, Igor , title =. International Conference on Machine Learning (ICML) , year =

[17] [17]

2024 , eprint =

Immorlica, Nicole and Lucier, Brendan and Slivkins, Aleksandrs , title =. 2024 , eprint =

2024

[18] [18]

2018 , eprint =

Irving, Geoffrey and Christiano, Paul and Amodei, Dario , title =. 2018 , eprint =

2018

[19] [19]

and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , title =

Jimenez, Carlos E. and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , title =. International Conference on Learning Representations (ICLR) , year =

[20] [20]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Koh, Jing Yu and Lo, Robert and Jang, Lawrence and Duvvur, Vikram and Lim, Ming Chong and Huang, Po-Yu and Neubig, Graham and Zhou, Shuyan and Salakhutdinov, Ruslan and Fried, Daniel , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =

[21] [21]

2019 , eprint =

Lanctot, Marc and Lockhart, Edward and Lespiau, Jean-Baptiste and Zambaldi, Vinicius and Upadhyay, Satyaki and P. 2019 , eprint =

2019

[22] [22]

and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =

Leibo, Joel Z. and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) , year =

[23] [23]

Leibo, Joel Z. and Du. Scalable Evaluation of Multi-Agent Reinforcement Learning with. International Conference on Machine Learning (ICML) , year =

[24] [24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[25] [25]

2024 , eprint =

Li, Junkai and Wang, Siyu and Zhang, Meng and Li, Weitao and Lai, Yunghwei and Kang, Xinhui and Ma, Weizhi and Liu, Yang , title =. 2024 , eprint =

2024

[26] [26]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Li, Nian and Gao, Chen and Li, Mingyu and Li, Yong and Liao, Qingmin , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =

[27] [27]

International Conference on Learning Representations (ICLR) , year =

Liu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and others , title =. International Conference on Learning Representations (ICLR) , year =

[28] [28]

2024 , eprint =

Lu, Li-Chun and Chen, Shou-Jen and Pai, Tsung-Min and Yu, Chan-Hung and Lee, Hung-yi and Sun, Shao-Hua , title =. 2024 , eprint =

2024

[29] [29]

Science , volume =

Human-Level Play in the Game of. Science , volume =

[30] [30]

2023 , eprint =

Mialon, Gr. 2023 , eprint =

2023

[31] [31]

and Stoica, Ion and Gonzalez, Joseph E

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G. and Stoica, Ion and Gonzalez, Joseph E. , title =. 2024 , eprint =

2024

[32] [32]

and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. ACM Symposium on User Interface Software and Technology (UIST) , year =

[33] [33]

and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph E

Patil, Shishir G. and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph E. , title =. 2023 , eprint =

2023

[34] [34]

Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Perez, Ethan and Huang, Saffron and Song, Francis and Cai, Trevor and Ring, Roman and Aslanides, John and Glaese, Amelia and McAleese, Nat and Irving, Geoffrey , title =. Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

[35] [35]

A Multi-Agent Reinforcement Learning Model of Common-Pool Resource Appropriation , booktitle =

P. A Multi-Agent Reinforcement Learning Model of Common-Pool Resource Appropriation , booktitle =

[36] [36]

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of

Piatti, Giorgio and Jin, Zhijing and Kleiman-Weiner, Max and Sch. Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of. Advances in Neural Information Processing Systems (NeurIPS) , year =

[37] [37]

International Conference on Learning Representations (ICLR) , year =

Qin, Yujia and Liang, Shihao and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and others , title =. International Conference on Learning Representations (ICLR) , year =

[38] [38]

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Rawles, Christopher and Clinckemaillie, Sarah and Chang, Yifan and Waltz, Jonathan and Lau, Gabrielle and Fair, Marybeth and Li, Alice and Bishop, William and Li, Wei and Campbell-Ajala, Folawiyo and others , title =. International Conference on Learning Representations (ICLR) , year =. 2405.14573 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

International Joint Conference on Artificial Intelligence (IJCAI) , year =

Ren, Siyue and Cui, Zhiyao and Song, Ruiqi and Wang, Zhen and Hu, Shuyue , title =. International Joint Conference on Artificial Intelligence (IJCAI) , year =

[40] [40]

Samvelyan, Mikayel and Rashid, Tabish and de Witt, Christian Schroeder and Farquhar, Gregory and Nardelli, Nantas and Rudner, Tim G. J. and Hung, Chia-Man and Torr, Philip H. S. and Foerster, Jakob and Whiteson, Shimon , title =. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) , year =

[41] [41]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Schaeffer, Rylan and Miranda, Brando and Koyejo, Sanmi , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[42] [42]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Schick, Timo and Dwivedi-Yu, Jane and Dessi, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[43] [43]

2023 , eprint =

Shah, Rusheb and Feuillade-Montixi, Quentin and Pour, Soroush and Tagade, Arush and Casper, Stephen and Rando, Javier , title =. 2023 , eprint =

2023

[44] [44]

and Yao, Shunyu , title =

Shinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik R. and Yao, Shunyu , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[45] [45]

2023 , howpublished =

2023

[46] [46]

Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Wang, Lei and Xu, Wanyu and Lan, Yihuai and Hu, Zhiqiang and Lan, Yunshi and Lee, Roy Ka-Wei and Lim, Ee-Peng , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =

[47] [47]

Transactions on Machine Learning Research (TMLR) , year =

Wang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, Anima , title =. Transactions on Machine Learning Research (TMLR) , year =

[48] [48]

Transactions on Machine Learning Research (TMLR) , year =

Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and others , title =. Transactions on Machine Learning Research (TMLR) , year =

[49] [49]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[50] [51]

Conference on Language Modeling (COLM) , year =

Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Zhang, Shaokun and Zhu, Erkang and Li, Beibin and Jiang, Li and Zhang, Xiaoyun and Wang, Chi , title =. Conference on Language Modeling (COLM) , year =

[51] [52]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Xie, Tianbao and Zhang, Danyang and Chen, Jixuan and Li, Xiaochuan and Zhao, Siheng and Cao, Ruisheng and Hua, Toh Jing and Cheng, Zhoujun and Shin, Dongchan and Lei, Fangyu and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[52] [53]

International Conference on Learning Representations (ICLR) , year =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , title =. International Conference on Learning Representations (ICLR) , year =

[53] [54]

and Cao, Yuan and Narasimhan, Karthik , title =

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Thomas L. and Cao, Yuan and Narasimhan, Karthik , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[54] [55]

2024 , eprint =

Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik , title =. 2024 , eprint =

2024

[55] [56]

AAAI Conference on Artificial Intelligence , year =

Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , title =. AAAI Conference on Artificial Intelligence , year =

[56] [57]

and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , title =

Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , title =. International Conference on Learning Representations (ICLR) , year =

[57] [58]

Zico and Fredrikson, Matt , title =

Zou, Andy and Wang, Zifan and Kolter, J. Zico and Fredrikson, Matt , title =. 2023 , eprint =

2023

[58] [60]

2024 , eprint =

Su, Altera AI and others , title =. 2024 , eprint =

2024

[59] [61]

Y. Wolf, N. Wies, O. Avnery, Y. Levine, and A. Shashua, ``Fundamental Limitations of Alignment in Large Language Models,'' Proceedings of the 41st International Conference on Machine Learning (ICML), 2024. arXiv:2304.11082. https://arxiv.org/abs/2304.11082

work page arXiv 2024

[60] [62]

https://philarchive.org/archive/NOUNRL

``No Red Lines: The Impossibility of Formal Safety Guarantees in LLMs,'' PhilArchive preprint. https://philarchive.org/archive/NOUNRL

[61] [63]

Glukhov, I

D. Glukhov, I. Shumailov, Y. Gal, N. Papernot, and V. Papyan, ``LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?'' arXiv preprint, 2023

2023

[62] [65]

https://arxiv.org/abs/2602.05656

``On the Limits of Behavioral Alignment: Formal Verifiability and the Problem of Normative Indistinguishability,'' arXiv:2602.05656. https://arxiv.org/abs/2602.05656

work page arXiv

[63] [66]

https://arxiv.org/abs/2603.28650

``Information-Theoretic Limits of Safety Verification for Self-Improving Systems,'' arXiv:2603.28650. https://arxiv.org/abs/2603.28650

work page arXiv

[64] [67]

Brcic and R

M. Brcic and R. V. Yampolskiy, ``Impossibility Results in AI: A Survey,'' ACM Computing Surveys, 2023. arXiv:2109.00484. https://arxiv.org/abs/2109.00484

work page arXiv 2023

[65] [68]

The Claude 3 model family

Anthropic . The Claude 3 model family. Technical report, Anthropic, 2024

2024

[66] [69]

Rothblum

Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, and Guy N. Rothblum. On the impossibility of separating intelligence from judgment: The computational intractability of filtering for AI alignment. arXiv preprint arXiv:2507.07341, 2025

work page arXiv 2025

[67] [70]

ChatEval : Towards better LLM -based evaluators through multi-agent debate

Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. ChatEval : Towards better LLM -based evaluators through multi-agent debate. In International Conference on Learning Representations (ICLR), 2024

2024

[68] [71]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[69] [72]

S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents

Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. S ^3 : Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[70] [73]

LLM censorship: A machine learning challenge or a computer security problem? arXiv preprint, 2023

David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, and Vardan Papyan. LLM censorship: A machine learning challenge or a computer security problem? arXiv preprint, 2023

2023

[71] [74]

Gemini : A family of highly capable multimodal models

Google DeepMind . Gemini : A family of highly capable multimodal models. Technical report, Google DeepMind, 2024

2024

[72] [75]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench : Can language models resolve real-world GitHub issues? In International Conference on Learning Representations (ICLR), 2024

2024

[73] [76]

Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2017

2017

[74] [77]

Leibo, Edgar Du \'e \ n ez-Guzm \'a n, Alexander Vezhnevets, et al

Joel Z. Leibo, Edgar Du \'e \ n ez-Guzm \'a n, Alexander Vezhnevets, et al. Scalable evaluation of multi-agent reinforcement learning with Melting Pot . In International Conference on Machine Learning (ICML), 2021

2021

[75] [78]

Agentsims: An open-source sandbox for large language model evaluation

Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, and Qin Chen. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023

work page arXiv 2023

[76] [79]

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team et al. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378 0 (6624): 0 1067--1074, 2022

2022

[77] [80]

GAIA: a benchmark for General AI Assistants

Gr \'e goire Mialon, Cl \'e mentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA : A benchmark for general AI assistants. arXiv preprint arXiv:2311.12983, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[78] [81]

GPT-4 Technical Report

OpenAI . GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[79] [82]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT : Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[80] [83]

Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Joon Sung Park, Joseph O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In ACM Symposium on User Interface Software and Technology (UIST), 2023

2023