Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

Gabriel Synnaeve; Nicolas Usunier; Soumith Chintala; Zeming Lin

arxiv: 1609.02993 · v3 · pith:5NZEEMNSnew · submitted 2016-09-10 · 💻 cs.AI · cs.LG

Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

Nicolas Usunier , Gabriel Synnaeve , Zeming Lin , Soumith Chintala This is my paper

classification 💻 cs.AI cs.LG

keywords learningscenariosalgorithmexplorationmicromanagementreinforcementbecausedeep

0 comments

read the original abstract

We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no obvious feature representation for the state-action evaluation function. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. In addition, we present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm allows for the collection of traces for learning using deterministic policies, which appears much more efficient than, for example, {\epsilon}-greedy exploration. Experiments show that with this algorithm, we successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Growing Action Spaces
cs.LG 2019-06 unverdicted novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.