pith. machine review for the scientific record. sign in

arxiv: 1309.6821 · v1 · submitted 2013-09-26 · 💻 cs.LG · stat.ML

Recognition: unknown

Sample Complexity of Multi-task Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords algorithmcomplexitymulti-taskreinforcement-learningsampletaskstransferper-task
0
0 comments X
read the original abstract

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications. Though there is encouraging empirical evidence that transfer can improve performance in subsequent reinforcement-learning tasks, there has been very little theoretical analysis. In this paper, we introduce a new multi-task algorithm for a sequence of reinforcement-learning tasks when each task is sampled independently from (an unknown) distribution over a finite set of Markov decision processes whose parameters are initially unknown. For this setting, we prove under certain assumptions that the per-task sample complexity of exploration is reduced significantly due to transfer compared to standard single-task algorithms. Our multi-task algorithm also has the desired characteristic that it is guaranteed not to exhibit negative transfer: in the worst case its per-task sample complexity is comparable to the corresponding single-task algorithm.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  2. Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

    cs.LG 2026-04 unverdicted novelty 7.0

    A low-rank matrix estimation method in a reward-free RL framework learns shared representations across linear MDPs and yields near-optimal policies with characterized regret bounds under relaxed feature assumptions.