Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning
read the original abstract
This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL), in which decentralized agents share control and an experience replay buffer with a centralized agent. Only the centralized agent is intrinsically rewarded, but the decentralized agents still benefit from improved exploration, without the distraction of unreliable incentives.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.