pith. sign in

arxiv: 1805.11711 · v1 · pith:IWPBDFQ2new · submitted 2018-05-29 · 💻 cs.LG · cs.AI· stat.ML

Depth and nonlinearity induce implicit exploration for RL

classification 💻 cs.LG cs.AIstat.ML
keywords explorationdepthgreedyinducelearnnonlinearityquestionresult
0
0 comments X
read the original abstract

The question of how to explore, i.e., take actions with uncertain outcomes to learn about possible future rewards, is a key question in reinforcement learning (RL). Here, we show a surprising result: We show that Q-learning with nonlinear Q-function and no explicit exploration (i.e., a purely greedy policy) can learn several standard benchmark tasks, including mountain car, equally well as, or better than, the most commonly-used $\epsilon$-greedy exploration. We carefully examine this result and show that both the depth of the Q-network and the type of nonlinearity are important to induce such deterministic exploration.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.