Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

Shimon Whiteson; Tabish Rashid; Wendelin B\"ohmer

arxiv: 1906.02138 · v1 · pith:YPS4WGHHnew · submitted 2019-06-05 · 💻 cs.AI

Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

Wendelin B\"ohmer , Tabish Rashid , Shimon Whiteson This is my paper

classification 💻 cs.AI

keywords agentsrewarddecentralizedexplorationintrinsiclearningunreliableagent

0 comments

read the original abstract

This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL), in which decentralized agents share control and an experience replay buffer with a centralized agent. Only the centralized agent is intrinsically rewarded, but the decentralized agents still benefit from improved exploration, without the distraction of unreliable incentives.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
cs.MA 2025-02 unverdicted novelty 6.0

Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.