Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Changjie Fan; Haotian Fu; Hongyao Tang; Jianye Hao; Yingfeng Chen; Zihan Lei

arxiv: 1903.04959 · v1 · pith:ZWDOH2GWnew · submitted 2019-03-12 · 💻 cs.LG · cs.AI· cs.MA· stat.ML

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Haotian Fu , Hongyao Tang , Jianye Hao , Zihan Lei , Yingfeng Chen , Changjie Fan This is my paper

classification 💻 cs.LG cs.AIcs.MAstat.ML

keywords deepmulti-agentactionspaceshybridparameterizeddifferentdiscrete-continuous

0 comments

read the original abstract

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning
cs.AI 2026-06 unverdicted novelty 6.0

ASALT uses observation-level and state-level adapters to align mismatched dimensionalities into a shared embedding for transferring actors and critics in MARL, showing improved sample efficiency and reduced negative t...
TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning
cs.LG 2026-06 unverdicted novelty 5.0

TRIDENT is a MARL framework using Richardson-Romberg gradient correction, Lyapunov-constrained trust-region updates, and a physics-informed residual critic that claims O(1/sqrt(K)) convergence to constrained Nash equi...