pith. sign in

arxiv: 1506.03379 · v2 · pith:QTTXG7EXnew · submitted 2015-06-10 · 💻 cs.LG · cs.AI

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

classification 💻 cs.LG cs.AI
keywords learninglifelongsequencetasksalgorithmproblemcoupon-collectorfinite
0
0 comments X
read the original abstract

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL). Despite much encouraging empirical evidence, there has been little theoretical analysis. In this paper, we study a class of lifelong RL problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action sets and different transition/reward functions. Motivated by the need for cross-task exploration in lifelong learning, we formulate a novel online coupon-collector problem and give an optimal algorithm. This allows us to develop a new lifelong RL algorithm, whose overall sample complexity in a sequence of tasks is much smaller than single-task learning, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in simulated problems, including a recently introduced human-robot interaction problem.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.