pith. sign in

arxiv: 1812.11794 · v2 · pith:VJEY4CXLnew · submitted 2018-12-31 · 💻 cs.LG · cs.AI· cs.MA· stat.ML

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

classification 💻 cs.LG cs.AIcs.MAstat.ML
keywords learningmulti-agentdeepmethodsproblemsagentsalgorithmsapplications
0
0 comments X
read the original abstract

Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning

    cs.MA 2026-05 unverdicted novelty 7.0

    ARMS is an automatic reward-shaping framework for sparse-reward MARL that uses trajectory ranking and conditional best-response reasoning to preserve Nash equilibria while improving sampling efficiency in pathfinding tasks.

  2. An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments

    cs.LG 2019-07 unverdicted novelty 4.0

    An attention-augmented actor-critic agent learns to dynamically weight multiple environment views by importance and outperforms baselines on TORCS and three other 3D simulators under noise and partial observability.

  3. A Deep Reinforcement Learning Approach for Global Routing

    cs.LG 2019-06 unverdicted novelty 4.0

    Deep RL agent trained on generated global routing instances outperforms sequential A* search.