pith. sign in

arxiv: 1905.06125 · v1 · pith:IBNWYLZPnew · submitted 2019-05-13 · 💻 cs.LG · cs.AI· stat.ML

Distributional Reinforcement Learning for Efficient Exploration

classification 💻 cs.LG cs.AIstat.ML
keywords explorationgamesqr-dqnalgorithmdistributiondistributionalefficientintrinsic
0
0 comments X
read the original abstract

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Training Language Models to Self-Correct via Reinforcement Learning

    cs.LG 2024-09 unverdicted novelty 6.0

    SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.