Learning Shared Representations in Multi-task Reinforcement Learning

· 2016 · cs.AI · arXiv 1603.02041

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this shared structure as it arises in the state-action value space. We will show how one can jointly learn optimal value-functions by modifying the popular Value-Iteration and Policy-Iteration procedures to accommodate this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization. this shared representation assumption and leverage the power of multi-task supervised learning. Finally, we demonstrate that the proposed model and training procedures, are able to infer good value functions, even under low samples regimes. In addition to data efficiency, we will show in our analysis, that learning abstractions of the state space jointly across tasks leads to more robust, transferable representations with the potential for better generalization.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Evidence of an Emergent "Self" in Continual Robot Learning

cs.RO · 2026-03-25 · unverdicted · novelty 6.0

Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.

Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning

cs.LG · 2025-12-23 · unverdicted · novelty 6.0

Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.

Multi-Task Representation Learning for Conservative Linear Bandits

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

CMTRL recovers a shared low-rank feature matrix for T constrained linear bandit tasks in d dimensions using Safe-AltGDmin and provides regret and sample complexity bounds.

citing papers explorer

Showing 3 of 3 citing papers.

Evidence of an Emergent "Self" in Continual Robot Learning cs.RO · 2026-03-25 · unverdicted · none · ref 13 · internal anchor
Continual learning robots form a significantly more stable invariant subnetwork than constant-task controls, and preserving it improves adaptation while damaging it hurts performance.
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning cs.LG · 2025-12-23 · unverdicted · none · ref 4 · internal anchor
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
Multi-Task Representation Learning for Conservative Linear Bandits cs.LG · 2026-05-12 · unverdicted · none · ref 16
CMTRL recovers a shared low-rank feature matrix for T constrained linear bandit tasks in d dimensions using Safe-AltGDmin and provides regret and sample complexity bounds.

Learning Shared Representations in Multi-task Reinforcement Learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer