pith. the verified trust layer for science. sign in

arxiv: 1906.07586 · v1 · pith:QHIVY42Znew · submitted 2019-06-18 · 💻 cs.LG · stat.ML

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

classification 💻 cs.LG stat.ML
keywords learningevaluationpolicyalgorithmanalysisefficientgap-increasinggrape
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{QHIVY42Z}

Prints a linked pith:QHIVY42Z badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It leverages two recent ideas: (1) gap-increasing value update operators in advantage learning for noise-tolerance and (2) off-policy eligibility trace in Retrace algorithm for efficient learning. We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning. Furthermore, our analysis shows that GRAPE's learning is significantly efficient than that of a simple learning-rate-based approach while keeping the same level of noise-tolerance. We applied GRAPE to control problems and obtained experimental results supporting our theoretical analysis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.