pith. machine review for the scientific record. sign in

arxiv: 1810.06394 · v1 · submitted 2018-10-10 · 💻 cs.LG · cs.AI· stat.ML

Recognition: unknown

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIstat.ML
keywords spaceactionhybridcontinuousdeeplearningconsiderdealing
0
0 comments X
read the original abstract

Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

    cs.LG 2026-05 unverdicted novelty 7.0

    HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.

  2. PALCAS: A Priority-Aware Intelligent Lane Change Advisory System for Autonomous Vehicles using Federated Reinforcement Learning

    cs.RO 2026-04 unverdicted novelty 5.0

    PALCAS applies federated RL and a novel priority-aware safe lane-change reward to improve AV lane changing decisions, reporting gains in efficiency, safety, comfort, arrival rates, and merging in SUMO simulations.