pith. sign in

arxiv: 2310.10701 · v3 · pith:3KM37UIOnew · submitted 2023-10-16 · 💻 cs.CL · cs.AI

Theory of Mind for Multi-Agent Collaboration via Large Language Models

classification 💻 cs.CL cs.AI
keywords agentsllm-basedmulti-agentmindtheorylanguagelargemodels
0
0 comments X
read the original abstract

While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    cs.CL 2025-11 unverdicted novelty 7.0

    Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

  2. The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

    cs.AI 2026-04 unverdicted novelty 6.0

    TEA Nets extracts agents, events, and targets from text to reveal emotional and semantic patterns in conspiracy theories and psychotherapy transcripts from humans and LLMs.

  3. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    cs.CL 2025-11 unverdicted novelty 6.0

    Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and ...

  4. GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

    cs.AI 2025-07 unverdicted novelty 6.0

    GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming pri...

  5. Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities

    cs.HC 2025-07 unverdicted novelty 6.0

    A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.

  6. CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems

    cs.MA 2026-05 unverdicted novelty 5.0

    CONCAT introduces a consensus- and confidence-driven ad hoc teaming method that reduces communication overhead in LLM-based multi-agent systems by up to 50% latency while improving efficiency ratio without any training.

  7. Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures

    cs.AI 2026-04 unverdicted novelty 4.0

    A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.

  8. Network Effects and Agreement Drift in LLM Debates

    cs.SI 2026-04 unverdicted novelty 4.0

    LLM agents in controlled network debates show agreement drift toward specific opinion positions, requiring separation of structural effects from LLM biases before using them as human behavioral proxies.

  9. A Survey of Reinforcement Learning for Large Reasoning Models

    cs.CL 2025-09 accept novelty 3.0

    A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.