Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
arXiv preprint arXiv:1703.04908 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
By capturing statistical patterns in large corpora, machine learning has enabled significant advances in natural language processing, including in machine translation, question answering, and sentiment analysis. However, for agents to intelligently interact with humans, simply capturing the statistical patterns is insufficient. In this paper we investigate if, and how, grounded compositional language can emerge as a means to achieve goals in multi-agent populations. Towards this end, we propose a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language. This language is represented as streams of abstract discrete symbols uttered by agents over time, but nonetheless has a coherent structure that possesses a defined vocabulary and syntax. We also observe emergence of non-verbal communication such as pointing and guiding when language communication is unavailable.
fields
cs.LG 3representative citing papers
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
citing papers explorer
-
Arena: a toolkit for Multi-Agent Reinforcement Learning
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
-
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
-
Do LLM-derived graph priors improve multi-agent coordination?
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.