pith. sign in

arxiv: 1903.04959 · v1 · pith:ZWDOH2GWnew · submitted 2019-03-12 · 💻 cs.LG · cs.AI· cs.MA· stat.ML

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

classification 💻 cs.LG cs.AIcs.MAstat.ML
keywords deepmulti-agentactionspaceshybridparameterizeddifferentdiscrete-continuous
0
0 comments X
read the original abstract

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

    cs.AI 2026-06 unverdicted novelty 6.0

    ASALT uses observation-level and state-level adapters to align mismatched dimensionalities into a shared embedding for transferring actors and critics in MARL, showing improved sample efficiency and reduced negative t...

  2. TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

    cs.LG 2026-06 unverdicted novelty 5.0

    TRIDENT is a MARL framework using Richardson-Romberg gradient correction, Lyapunov-constrained trust-region updates, and a physics-informed residual critic that claims O(1/sqrt(K)) convergence to constrained Nash equi...