Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications
read the original abstract
Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning
ARMS is an automatic reward-shaping framework for sparse-reward MARL that uses trajectory ranking and conditional best-response reasoning to preserve Nash equilibria while improving sampling efficiency in pathfinding tasks.
-
An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments
An attention-augmented actor-critic agent learns to dynamically weight multiple environment views by importance and outperforms baselines on TORCS and three other 3D simulators under noise and partial observability.
-
A Deep Reinforcement Learning Approach for Global Routing
Deep RL agent trained on generated global routing instances outperforms sequential A* search.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.