Market-Bench is a new multi-agent benchmark showing that LLMs display large performance gaps in economic tasks, with only a few consistently growing capital while most break even despite similar ad quality.
Alympics: Language agents meet game theory
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A persona-driven multi-agent framework with a three-dimensional decision-theoretic evaluation shows that agent-persona alignment significantly impacts performance and coordination in O-RAN optimization challenges.
SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.
The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.
citing papers explorer
-
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
Market-Bench is a new multi-agent benchmark showing that LLMs display large performance gaps in economic tasks, with only a few consistently growing capital while most break even despite similar ad quality.
-
Decision-Theoretic Safety Assessment of Persona-Driven Multi-Agent Systems in O-RAN
A persona-driven multi-agent framework with a three-dimensional decision-theoretic evaluation shows that agent-persona alignment significantly impacts performance and coordination in O-RAN optimization challenges.
-
SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model
SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.
-
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.